perm filename CHPT.6[LSP,JRA]3 blob
sn#293478 filedate 1977-06-29 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00029 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00004 00002 .SEC(The Dynamic Structure of LISP,Dynamic Structure,,P107:)
C00027 00003 .SS(Primitives for LISP,,P230:)
C00042 00004 .SS(%2SM%*: A Simple Machine,%2SM%*,P33:)
C00060 00005 .SS(Implementation of the Primitives,,P234:)
C00075 00006 .SS(Assemblers,Assemblers,P7:)
C00086 00007 .SS(Compilers for Subsets of LISP)
C00094 00008 .SS(Compilation of Conditional Expressions)
C00097 00009 .BEGIN NOFILLGROUP
C00100 00010 .GROUP
C00108 00011 .SS(One-pass Assemblers and Fixups)
C00113 00012 .<<FIXUPS>>
C00123 00013 .SS(Compiling and Interpreting,,P240:)
C00128 00014
C00138 00015 .SS(A compiler for Simple %3eval%*: The Value Stack,compiler,P32:)
C00151 00016 .SS(A Compiler for Simple %3eval%*,compiler,P249:)
C00162 00017 Here is a partial sketch of %3compile%* operating on %3j <= λ[[xy]f[g[x]h[y]]]%*.
C00166 00018 .SS(Efficient Compilation)
C00171 00019 .SS(Efficiency: Primitive Operations,,P166:)
C00177 00020 .SS(Efficiency: Calling Sequences)
C00190 00021 .SS(Efficiency: Predicates)
C00196 00022 .SS(A Compiler for %3progs%*)
C00201 00023 .SS(Further Optimizations,,P258:)
C00210 00024 .SS(Functional Arguments,,P3:)
C00215 00025 .SS(Macros and Special Forms,macros,P57:)
C00221 00026 .SS(Compilation and Variables,non-local variables,P5:)
C00231 00027 .SS(Interactive Programming,,P239:)
C00246 00028 .SS(LISP Editors,,P247:)
C00252 00029 .SS(Debugging in LISP,,P168:)
C00265 ENDMK
C⊗;
.SEC(The Dynamic Structure of LISP,Dynamic Structure,,P107:)
.TURN ON "←";
.SS(Introduction)
As things presently stand we have developed the basic static structure of a LISP
machine consisting
of %3eval%* and its subfunctions, the I/O routines, the garbage
collector, and an initial symbol table which contains the constants. These
constants include primitive
functions, %3T%1 and %3NIL%1, and usually a collection of utility functions
like %3append%1, and %3reverse%1.
We have two areas of memory:#pointer space, and full word space.
Expressions to be evaluated are read in, converted to list structure,
and evaluated by %3eval%*. The evaluation entails traversing the
S-expression representation of the form, interpreting the
information found there as LISP "instructions".
We have discussed some basic data structures for
implementation of λ-bindings and evaluation,
but we have said little about how the actual execution of the expression
takes place. The essential ingredient involved here is the handling
of control -- the dynamics of LISP execution.
For example,
how can we implement call-by-value, parameter passing, recursion, and
conditional expressions?
At a high level, the original %3eval-apply%1 pair of {yonss(P6)}
%6does%1 describe the evaluation mechanism. Given the data structure representation
of an expression, we can mechanically perform the transformations
implied in the %3eval-apply%1 pair. Even more of the detail is made
explicit in the later evaluators of {yonss(P209)} through {yonss(P211)}.
However we must still implement these evaluators on a "real" machine
and, unless the evaluator is built into the hardware, we must
express the evaluator in terms of more primitive operations. For example,
we cannot "implement" recursion by %6using%1 recursion; we must express
that idea in terms of lower level operations. Obviously this
decomposition must stop somewhere. As J.#McCarthy once said: "Nothing
can be explained to a stone"; we must assume certain primitives are known.
In this chapter we will discuss two layers of "primitive operations"
or %2instructions%1. One layer will
correspond to traditional hardware, and another layer will correspond
to the primitives which we derive from the evaluator of {yonss(P211)}.
Here we discuss the primitives of that section as the basis for a machine
which executes LISP expressions. We can describe
the evaluation of a LISP expression as the execution of a
sequence of these instructions. Both operations are equivalent: either
evaluate the expression or execute the instruction sequence.
There are common instances in which the execution of the instructions
can be considered to be "more efficient" than the evaluation of the expression.
For example, consider the access to a local variable. Each such access
is to the same location relative to the local environment. That
relative location can be computed easily, but the
evaluator will use a version of %3lookup%1 for every access.
We %6must%1 resort to %3lookup%1 for non-local variables, since
LISP uses dynamic binding and
the activation environment will typically effect which binding is accessible,
but since the location of any local variable is computable, we should
exploit that knowledge when executing our programs.
Several examples also arise in the evaluation of a %3prog%1. For example
a loop typically consists of a static⊗↓By static we mean that the
actual expressions do not change during execution. Using the fact that
LISP programs are data structures and using
some marginal programming techniques %3rplaca%1 and %3rplacd%1 ({yonss(P171)})
we can in fact write self modifying programs. However, such practice is
not common.← sequence of statements. Each time around the loop an
evaluator will execute the same sequence of instructions. It would be
faster to simply execute the sequence of instructions rather than
re-interpret each expression. A related efficiency involves the execution of
%3go%1. We assumed in {yonss(P211)} that the evaluator will either lookup
the label by searching the body of the %3prog%1 or, perhaps more efficiently,
searching a computed %3go_list%1. Either case requires a search.
If we can replace the body of a loop with a sequence of primitive instructions,
then we can replace a %3go%1 with a transfer of control to the
beginning of an appropriate block of instructions. Such a transfer operation would
be one of our instructions.
A problem related to the execution of loops is the %6recognition%1 of
a loop.
The extent of --or even the presence of-- a loop
which the user is
controlling by tests and %3go%1's may be difficult to discover. If a loop
is controlled by language constructs (%3while,#do,#repeat%1,#etc.) then the
interpreter should have some chance of improving the execution of the loop.
This, perhaps, is another good reason for removing control of iteration
from the hands of the programmer.
The translation of an expression into a sequence of instructions is not without
cost. If we wish to evaluate a simple expression only once, then
direct evaluation is usually less time-consuming than translation plus
execution.
However expressions subjected to repeated evaluation can profitably be
translated into instructions and executed.
The translation part of the process which we have been describing is called
a %2compiler%1.
It
is a mapping from the LISP expressions to
a sequence of instructions. When the instructions are
carried out they will have the same effect as %3eval%*, interpreting
the original expression.
A compiler is a useful tool for increasing the speed of execution.
J. McCarthy says a compiler allows you to look %6before%1 you leap; we will
show that we can look %6as%1 we leap. That is, we can compile instructions
as we evaluate expressions; if the expression is evaluated again
then we execute the faster compiled version.
The relationship between compilation and interpretation should be coming
more apparent: the interpreter performs the evaluation; the
compiler emits instructions which when executed produce the same computational
effect as the evaluator.
Since the code produced by the compiler is either in machine language
or in a form closer to the machine than the source program, we can execute
the code much faster. A speed-up factor of thirty to fifty is not uncommon.
Also in LISP we know that programs are stored as S-expressions.
Our compiled code will be machine language, thus freeing a large quantity
of S-expression storage. This will usually make garbage collection
less time consuming.
Why not compile all programs? We already have seen that we can %3cons%*-up
new expressions to be evaluated while we are running. Still we can force
even those expressions through a compiler before execution⊗↓There are,
however, programs which simply %6cannot%* be compiled. The most obscene
examples involve self-modifying programs;
that is, programs which modify their representation in order to
affect the course of interpretation. An example is reluctantly
given on {yon(P161)}.←. The answer,
rather, is that
for debugging and editing of programs it is extremely convenient
to have a structured representation of the program
in memory.
This structured representation also simplifies the discussion of compilation.
It is true that compilers can be written to
go directly
from M-expression representation to internal machine code⊗↓The compilers
which perform in this manner are called sytnax directed compilers. They
are an instance of a computational scheme called syntax directed computation;
the idea is based on the observation that many algorithms parallel the
underlying data structure. We have seen this behavior frequently in our
data structure algorithms. For application to cmpiling and parsing
see ⊗↑[Gri#71]↑ or ⊗↑[Aho#72]↑.←.
Conventional compiler discussions include description of the syntax analysis
problems, for we cannot compile code until we know what the syntactic structure
of the program is. However, syntax analysis is really irrelevant
for a clear understanding of the implementation of a compiler.
Assuming the existence of the structured representation, the compiler
is conceptually very simple.
This structured representation is the S-expr representation in LISP and
resembles a parse tree in other languages.
When
we wish to run the program at top speed, we can compile the programs.
The compiler can then translate the abstract representation of the program
into machine code.
We shall say more about this view of programming later.
We shall exploit the analogy between compilers and evaluators when we write the
LISP function, ⊗→%3compile%*↔←, which will implement the compiler.
We shall do this in two ways. First, the structure of the %3compile%* function
and its subfunctions will be written to capitalize on the knowledge we
have acquired from writing evaluators. Second, we will attempt
to abstract from the specific compiler the essence of all LISP compilers without
losing all of the details.
This second task is more difficult because we have to separate two representations
from the specific compiler. We are representing forms as data structures, and we
are also dealing with the representation of a specific machine.
The task %6is%* worth pursuing since we wish to write different compilers
for the same machine and also a single compiler capable of easy transportation to
other machines.
The input to
%3compile%* is the representation of a LISP function; the output is a list
which represents a sequence of machine instructions. Assume that we have
LISP running on %2Brand X%* machine, and we have written a
%3compile%* which produces code for %2Brand X%* machine. Then perform
the following sequence of steps:
.BEGIN TABIT1(15);GROUP
\ 1. Create the S-expression form of %3compile%*.
\ 2. Introduce this translation into the machine, defining %3compile%*.
\ 3. Ask LISP to evaluate: %3compile[COMPILE]%*.
.END
Since %3compile%* compiles code
for %2Brand X%* machine, it translates the S-expression representation of its
argument
into machine code. Therefore the output of step 3 is a list of instructions
representing the translation of %3compile%*. That is, step 3 compiles
the compiler.
A technique called %2⊗→bootstrapping↔←%* is closely related to the process
just described.
To illustrate this idea,
assume that we have LISP and its compiler running on %2Brand#X%*, and
we wish to implement LISP on %2Brand Y%*. If we have been careful in
our encoding of the %3compile%* function then %6most%* of %3compile%* is
machine independent; that is, it deals mostly with the structure of the
LISP expression and
only in a few places deals with the structure of a particular machine. As we
will see, this is not an unrealisitc assumption.
Notice that this is one of our early programming admonitions:
encode algorithms in a
representation-independent style and include representation-dependent
routines to interface with the program. Changing representations simply
requires changing those simpler subfunctions. Here the representations are
of machines and the algorithm is a compiling function for LISP.
Let us call those parts of the compiler
which deal with the machine, the ⊗→code generators↔←
or instruction generators. Now if we understand the
machine organization of brands %2X%* and %2Y%* then for any instruction on %2Brand#X%*
we should be able to give a sequence of instructions having the equivalent
effect on %2Brand#Y%*. We can change the instruction
generators in %3compile%*
to generate instructions which run on %2Brand#Y%*. We would have a %3compile%* function,
call it %3compile*%*, running on %2X%* and producing instructions for %2Y%*.
Take the S-expr representations of %3eval, apply, read, print, compile*,...%* etc.
and pass these through %3compile*%*; it this way we generate a large segment
of a LISP system which will run on %2Y%*. Certain primitives will have to be
supplied to run these instructions on %2Y%*, but a very large part of LISP can be
bootstrapped from %2X%* to %2Y%*.
Given a compiler and interpreter for a language %7L%41%1 we can also
bootstrap %7L%41%1 to a language %7L%42%1. We express the interpreter
for %7L%42%1 as a program in %7L%41%1. We can then execute programs in %7L%42%1
by interpreting the %7L%42%1 interpreter. We can improve efficiency by
compiling the %7L%42%1 evaluator. Perhaps we can express the %7L%42%1
compiler in %7L%41%1 or %7L%42%1; in either case we can then compile that compiler.
This chapter will deal with the implementation of the control structures
of LISP.
We will describe implementations of these features as we develop
compilers for LISP.
This use of compilers has several motivations:
.BEGIN INDENT 0,3;
.GROUP;
%21.%* To describe the compiler we must carefully define the machine
which will execute the instructions which the compiler produces.
The instructions generated by the compiler will reference the control primitives of the
machine, so we might as well build a compiler at the same time we
show the dynamic structure of LISP.
.APART
.GROUP;
%22.%* The design of the compiler shows another non-trivial application
of non-numerical
computation. At the same time we will see how simple it is to
describe such complex algorithms in LISP.
.APART
.GROUP;
%23.%* It will clearly show the relationship between compilation and
evaluation. That is, the LISP function representing the compiler
will very closely parallel the structure of the interpreter, %3eval%*.
If you understand %3eval%*, then the compiler is easy.
.END
As in the previous chapter, we will remain as abstract
as possible without losing the necessary details.
A meaningful
description of compilers entails an understanding of a machine,
so before the actual construction of the compilers, we will
describe a simple machine
with a sufficient instruction set to handle the control structures of LISP.
First we will review and expand the primitives of {yonss(P211)}, emphasizing
their interpretation as machine instructions.
.SS(Primitives for LISP,,P230:)
In our discussion of the evaluators in {yonss(P209)} through {yonss(P211)}
we uncovered more details involved in evaluation of LISP
expressions. In the final evaluator we identified a dozen or so actions.
The underlying
idea was to remove recursion and replace that implicit control structure
with very explicit actions, controlled by a simple loop:
.BEGIN TABIT3(30,38,40);GROUP;SELECT 3;
\loop <= λ[[]prog[[]
\\l\cont[]
\\\go[l] ]]
.END
The variable %3cont%1 was a functional variable, bound to states
and set to the next state by the action of the current state.
This observation marks the beginning of a machine description. What remains
to be done is to separate the actions of the machine from the instructions
it is executing. That is, some of the details of the state transformations
deal with the bookkeeping which the machine is doing to discover what the
expression is, and some of the transformations perform the actual evaluation
of the expression. For example, the manipulation of %3fun%1 and %3args%1
is part of the activity to discover the form of the expression.
The execution of %3send%1 and %3receive%1 are involved with the evaluation.
The parts of the evaluator involved with the execution of the expression
are called the instructions of the machine. Supplied with an appropriate
execution device, a sequence of these instructions captures the
meaning of the evaluation of an expression. It is the business of this section
to review the evaluators and extract a sufficient set of instructions.
We begin that task with some examples, using %3peval%1 of {yonss(P211)} as
the basic interpreter.
First, the evaluation of a constant %3A%1 involves the recognition that
we have seen a constant; that is part of the control of the evaluator.
We evaluate that constant by %3send[denote[]]%1. The %3denote%1 operation
is still part of the evaluator, but the %3send%1 operation is an instruction.
The execution of %3send[A]%1 performs the evaluation. The %3restore%1 operation
returns the evaluator to its previous state. We must allow for some state-saving
in our repertoire of instructions. The evaluation of a function application,
like %3g[A]%1, involves the evaluation of %3A%1, the calling of %3g%1, and
a means of returning to the computation surrounding %3g[A]%1.
Function calls involve several things: we need space to contain the evaluated
arguments; we need a control mechanism to describe which argument is being
evaluated; we need to suspend a computation such that we can
execute the function with the evaluated arguments; and we must be able to
return to the suspended computation when the function has completed its task.
The necessary ingredients are already present in %3peval%1; we need only
extract and package them. Clearly %3alloc_dest%1 is involved in
getting new space for the evaluated arguments. There is a second required
activity since %3alloc_dest%1 always occurs in conjunction with
a %3save%1 of the current %3dest%1. Therefore we define an instruction
named %3alloc%1 which saves the current destination %6and%1 intitializes
a new %3dest%1 block.
Each slot of the destination block is filled by an appropriate %3send%1
operation. Examination of the sub-states of %3evalargs%1#({yon(P231)})
reveals another
machine instruction: %3next[]%1 is used to increment the destination pointer.
Finally, after all the arguments are evaluated, we must make the destination
block the local environment, and call the function. Thus two more instructions:
%3link%1 will attach the destination block as the local environment, and
and restore the previous dest block;
%3call%1 will call the function, after saving sufficient control information
so that we may return after execution of the function is completed.
For example, consider %3f[g[A];h[B]]%1. Assuming %3f%1 and %3g%1 are
λ-definitions with formal parameters %3[x;y]%1 and %3[z]%1 respectively, and
%3h%1 is a primitive, then an instruction sequence might be:
.BEGIN GROUP;SELECT 3;nofill;
alloc[(X Y)]; alloc[(Z)]; send[A];
link[]; call[G]; next[];
alloc[(G1)]; send[B]; link[];
call[H]; link[]; call[F]
.END
There are two classes of instructions to break the sequential
flow of a sequence of instructions: we transfer control when we call or
return from a function; and we transfer control when we execute a conditional
expression.
.P241:
Examination of %3ev2%1, %3ev5%1, and %3ev6%1 ({yon(P232)})
reveals some of the details of a function call-return sequence.
After saving the current environment, restoring the saved destination, and
saving a continuation point, we passed control to the body of the function.
The instruction sequence, representing the body of the function, will
be terminated by a call on %3ret%1. This instruction will
restore the saved environment and return control to the instruction immediately
after the %3call%1. The saved information is governed by the variable
named %3control%1, with %3call%1 adding information, and %3ret%1 removing
information. Before showing how instructions are executed and how %3control%1
is manipulated, we will describe the primitives for conditional expressions.
Examination of the details of %3evcond%1 and its associated functions
({yon(P233)}), exhibits more instructions. We use the evaluator
to evaluate a predicate; we then %3receive%1 the result from the %3dest%1-block.
If that answer is true, we evaluate one path; otherwise we evaluate
another path. We see two instructions here: a test and jump instruction,
which we shall call %3receive_test%1, which tests the contents
of the current destination slot and jumps to one instruction sequence
if the result is true, and jumps to (usually) another instruction sequence
if the result is false. The second instruction is a means of jumping
unconditionally to a prescribed instruction sequence. This second instruction is
named %3goto%1.
.GROUP;
.P236:
For example, a conditional expression [p%41%1 → e%41%1; ...;p%4n%1 → e%4n%1]
has a code sequence like:
.BEGIN TABIT2(25,28);GROUP;
\\<instructions for p%41%1>
\\[%3receive_test%1[] → <code for e%41%1>;%3goto[a0]%1]
\\<instructions for p%42%1>
\\ ...
\\<instructions for p%4n%1>
\\[%3receive_test%1[] → <code for e%4n%1>;%3goto[a0]%1]
\\%3err[NO_TRUE_COND_CLAUSE]
\a0 . . .
.END
.APART
Whenever %3receive_test%1 is true we execute a sequence of instructions
and then transfer out of the conditional using the %3goto%1.
We could have treated conditional expressions like special function
calls, saving %3a0%1 as the continuation and restoring it from %3control%1
instead of using %3goto%1.
However conditional expressions don't require that extra generality⊗↓Our
treatment of conditionals is an instance of "open coding" a function call.
That means we replace a possible %3call-ret%1 with the "in-line"
instruction sequence which makes up the body of the function. This
trick gives faster execution, but takes more space. We will see
another instance of "open-coding" when we compile macros in {yonss(P57)}.←.
We can now give a more detailed picture of a device which can execute this
instruction set. A program will be represented as a sequence of
instructions. Some of these instructions may be prefaced with labels.
These labels either represent function names or names used within a conditional
expression. Given a sequence of instructions named %3inst_seq%1, we expect
that they be executed in sequence, unless some transfer of control occurs.
For example, the following program suffices for the execution of such
instruction sequences:
.BEGIN TABIT3(17,33,35);GROUP;SELECT 3;TURN OFF "←";
\loop <= λ[[inst_seq]prog[[i_s;pc]
\\\i_s ← inst_seq;
\\l\[null[i_s] → return[%9`%3halt]]
\\\pc ← first[i_s];
\\\i_s ← rest[i_s];
\\\[is_label[pc] → NIL; %et%3 → pc[]]
\\\go[l] ]]
.END
.BEGIN TURN OFF "←";
If %3loop%1 returns %3HALT%1, then the result of our computation is found
in %3dest%1.
Labels are not executable instructions, and are therefore ignored.
The effect of %3goto%1 is to replace the current instruction sequence
with the sequence which begins immediately after the label which is the
argument to the %3goto%1. The effect of %3call-ret%1 is a bit more complex.
We describe %6only%1 the control aspects of %3call%1, leaving the
other details until later.
Let an instance %3call[fn]%1 be the current instruction; and let
%3is%9'%1 be the current instruction sequence. Note that %3is%9'%1 is the
sequence immediately after the call. We save %3is%9'%1 on %3control%1
by %3control#←#concat[is%9'%3;control]%1; then we set %3i_s%1 to the sequence
beginning at %3fn%1. Execution of %3go[l]%1 sends us to label %3l%1
and we begin executing the body of %3fn%1.
We leave %3fn%1 by executing %3ret%1. This instruction performs
.BEGIN CENTER;
%3i_s#←#first[control];#control#←#rest[control]%1;
.END
and we are back at the instruction following the %3call%1.
.END
Part of the execution of %3call%1 and %3goto%1 involves locating
the desired label. Since we have saved the original instruction sequence
we can search that list for the desired label. We will see more
effective ways for locating labels in {yonss(P7)}.
.SS(%2SM%*: A Simple Machine,%2SM%*,P33:)
This section describes a simple machine which
has a sufficient instruction set to describe the LISP primitives
in terms of a more conventional machine⊗↓See also [Deu#73].←.
Note that this machine is %6not%* necessary for our
understanding of %3eval%*. %3eval%* is self-descriptive. We need describe
a machine only to discuss implementation and compilation. This indeed,
is an objection to describing meaning of programming languages
in terms of a compiler: you must then understand %6two%* things, the
language %6and%* a machine.
The simple machine, %2SM%*, has a similarity to the organization of the
PDP-10 ⊗↑[DEC#69]↑. We need very few features to discuss the interesting
facets of the implementation
of our primitives. Certainly, if we were to undertake a real implementation, many
more instructions would be desirable. Similarly, when we discuss compilation
our %2SM%* suffices, but if we wished to perform %6efficient%* compilation
we would expect to have a better instruction set. The point here
is to understand basic algorithms. When that is accomplished it is
reasonable to examine problems of efficiency and details of implementation.
We address some of the techniques available for optimization of compiler code
in later sections.
%2SM%* has a conventional addressable main memory, including registers,
%3AC1, AC2, ..., ACn%1. These registers, called %2accumulators%*,
can either be used as special pointer registers or addressed as
memory locations %30%1 through %3n%1.
Each memory location is assumed to be large enough to contain
two addresses.
For sake of discussion, assume the word size is 36 bits.
The mapping of a dotted-pair onto an %2SM%* location is straightforward: the %3car%*
maps to the left-half of the word; the %3cdr%*, to the right.
The addressing space for dotted pairs is therefore 2%818%*.
A memory area is set aside to contain such dotted pairs.
A memory area is also dedicated to full-word space;
all p-names and numbers are stored there.
Parts of %2SM%* memory can be designated as stacks. Each stack is a contiguous
area of memory, and the current top of a stack is referenced by one of the
registers, %3P1, ..., Pj%1; these registers are called
%2stack-pointers%*⊗↓On the PDP-10
a stack pointer must be one of the %3AC%1 registers.←.
The stacks will be used to contain the partial results of calculations and
will contain the information necessary to implement the return from
recursive functions.
In most of the compilers we discuss, a single stack suffices for saving
partial computations, environments, as well as control information. This
single stack will be referred to by %3P%1.
There are only three classes of instructions necessary to describe
our implementation: instructions for constant generation, instructions for
stack manipulation, and instructions for flow of control.
The control instructions and some of the stack instructions refer to the
program counter of %2SM%*. This counter is designated as %2PC%*.
In the following, %3C%1
means "contents of..."; %3ac%* means any accumulator; %3loc%* means
either a memory location or an %3ac%*; and %3mem%* is reserved for memory
references⊗↓Again, in a real PDP-10 %3mem%1 can be an %3ac%1.←.
.GROUP;
Here are the instructions:
.BEGIN TABIT1(19);TURN OFF "←";SELECT 3;
MOVEI ac const\C(ac) ← const
PUSH P ac\C(P) ← C(P)+1. %1Increment stack pointer%*
\C(C(P)) ← C(ac).%1 Copy contents of %3ac%* onto top of stack.%3
POP P ac\C(ac) ← C(C(P)). %1Copy top of stack into %3ac.
\C(P) ← C(P)-1. %1Decrement stack pointer.%3
.END
.APART
.BEGIN TABIT1(19);TURN OFF "←";SELECT 3;GROUP;
%1The next two instructions are used in function call-return situations.%*
PUSHJ P loc \C(P) ← C(P)+1. %1Increment stack pointer%*
\C(C(P)) ← C(%2PC%*).%1 Place address following the %3PUSHJ%1 in the stack.%3
\C(%2PC%3) ← loc. %1 change control to location %3loc%1.
%3POPJ P \C(%2PC%3) ← C(C(P)). %1Copy top of stack into %2PC%1.
\%3C(P) ← C(P)-1. %1Decrement stack pointer.%1
.END
.BEGIN FILL;SELECT 1;
We have ignored some of the details of stack operations;
each stack operations must consider boundary conditions
on the storage allocated for the stack.
Any condition which would violate these bounds must be detectable.
If a stack is allocated in a discontinuous fashion#(⊗↑[Bis#74]↑)
then a storage management decision must be made; if the stacks are of fixed
size, then an error must be signaled.
.END
.BEGIN TABIT1(19);TURN OFF "←";SELECT 3;GROUP;
.BEGIN INDENT 0,19;FILL;
%3MOVE ac loc \C(ac) ← C(loc).%1
This is an instruction to load a specified %3ac%* with the contents
of %3loc%*. Note %3loc%1 may be an %3ac%1; e.g. %3MOVE#AC%41%3#AC%42%1.
.END
.BEGIN INDENT 0,19;FILL;
%3MOVEM ac loc \C(loc) ← C(ac).%1 Copy contents of %3ac%1 into %3loc%1.
For example, %3MOVEM#AC%41%3#AC%42%3#=#MOVE#AC%42%3#AC%41%1.
.END
.END
.BEGIN TABIT1(19);TURN OFF "←";SELECT 3;GROUP;
.P153:
%3SUB ac loc\C(ac) ← C(ac) - C(loc).
%3JUMP mem\C(%2PC%*) ← mem. %1Go to location %3mem%*.
%3JUMPF ac mem\%2if %3C(ac)=%ef%1 %2then%3 C(%2PC%*) ← mem;
.BEGIN INDENT 0,19;FILL;
%3JUMPT ac mem\%2if %3C(ac)≠%Ef%1 %2then %3C(%2PC%*) ← mem.
%1Note that %3JUMPT%1 implements the
coding trick of {yonss(P214)} which maps %et%1 onto everything which is
not false.
.END
.END
These instructions are executed by a machine whose basic execution cycle is
something like:
.BEGIN TABIT2(15,18);GROUP;SELECT 3;TURN OFF "←";
\%2l:%3\C(%2IR%3) ← C(C(%2PC%3))
\\C(%2PC%3) ← C(%2PC%3) + 1
\\ %2execute %3C(%2IR%3)
\\%2go to l
.END
The %2IR%*, or %2Instruction register%1, is an internal register
used to hold the instruction we are executing. Note that the %2PC%*
register is incremented %6before%* execution of the instruction. If we
incremented %2PC%1 %6after%1 the execution
of the instruction, and the instruction were a JUMP-type
instruction, then the %2PC%1 would get a spurious incrementation.
.P167:
A critical part of LISP evaluation involves procedure calls and returns.
Since we expect to handle recursive calling sequences,
the %3call-ret%* pair ({yon(P241)}), represented
as %3CALL%1 and %3RET%1, must take this into account. However, there is a
more fundamental requirement of this pair: they must make sure that, on completion of
a %3CALL%*, the %3RET%* can return to the instruction which directly follows the
%3CALL%*.
This requirement can be accomplished by a less comprehensive call, say %3JSR%1,
(for Jump SubRoutine),
which stores the current value of the %2PC%* in a known location. Then the
return, %3JRTH%1, (for Jump THrough),
need only pick up that saved value and restore it into the
%2PC%*. We could implement this instruction on our machine. Recall that in the
basic machine cycle the %2PC%* was incremented %6before%* the
execution of the instruction. Thus if we were about to execute a %3JSR%1
the %2PC%* is already pointing at the next instruction; all we need to do
is save the current %2PC%*.
So let's assume that %3JSR F%1 stores the %2PC%* in %3F%* and begins execution
in location %3F+1%1. Then:
.BEGIN TABIT1(19);TURN OFF "←";
%3JSR F\%3C(F) ← %3C(%2PC%3)%1. Save the %2PC%* in %3F%*.
\%3C(%2PC%*) ← F + 1.%1 Jump to location represented by %3F + 1%*.
%3JRTH F%* \%3C(%2PC%*) ← C(F).
.END
This pair is indeed how several languages perform their calling sequences.
It's fast and efficient; however it is not sufficient for recursive control.
If we always store in a %6fixed%* location, only the
result of the %6last%* store would be
available and previous values set by prior recursions would have been lost.
What we need is an implementation of the actions of %3control%1.
For purpose of our discussion we can assume that %3control%1 operates in a
stack-like fashion⊗↓%1Unless we wish to consider extensions of LISP,
a stack is sufficient for LISP's control environment.←.
What the %3CALL%* will do is %6push%* the current contents of the %2PC%* onto the
control stack; and %3RET%* will pop off the top element and put it into the %2PC%*
register⊗↓What
will be found on the control stack is a time-sequence of
those procedures which have been entered but have not yet been completed.
Such information is exceptionally useful in debugging programs.←.
The behavior we have just described is that attributed to the %3PUSHJ-POPJ%1
pair when they are applied to the control stack. We have separated out
the %3CALL-RET%1 pair since the calling process is not always as simple
as %3PUSHJ-POPJ%1. Several things impinge on our decision:
.BEGIN INDENT 7,7;
%21.%1 We want to be able to supply detailed debugging information to the
user. How this will be accomplished will be the topic of {yonss(P168)}.
%22.%1 We want to be able to freely replace functions
with new definitions. A %3PUSHJ%1 goes to a particular sequence of instructions.
%23.%1 We want to be able to intermix compiled and interpreted programs.
Compiled programs may call interpreted
programs, and vice versa. Indeed we may even
wish to replace an interpreted (compiled) definition with a compiled (interpreted)
version.
%24.%1 In dealing with functional arguments, we must be able to transfer
control to a function variable. We cannot know where the %3PUSHJ%1 should
transfer.
.END
When an interpreted function calls a compiled (or primitive)
function, %3eval%* will look for the indicator, %3SUBR%*; then retrieve the
machine address of the code and enter via a %3PUSHJ%*. That code should exit
(back to %3eval%*) via a %3POPJ%*, after assuring that any stacks have
been appropriately restored.
.P243:
Compiled functions call other functions via %3CALL%1.
The %3CALL%* must discover how to call the function: is it a
%3SUBR, EXPR%*, an %3FEXPR%1, etc? The function is called and
on completion control is
returned to the address immediately following the %3CALL%*.
For example, %3(CALL fn)%1 can be implemented as %3(PUSHJ P DECODE)%1,
where %3P%1 represents the control stack pointer, and %3DECODE%1 represents
a routine to decode the actual procedure call. Within %3decode%1 we
know that %3C(C(P)-1)%1 is the actual call instruction; %3decode%1 then can
access the function definition associated with %3fn%1, set up the call, and
then return via a %3POPJ%1.
Within any %3CALL%1 or %3PUSHJ%1 we may call any function, including
that function itself.
This brings us to one of the most important conventions
for %6any%* stack-like call-return sequence:
Whatever
we push onto a stack within the body of a function %6must%* be
popped off before we exit from the function body. That is, the state of any stack
must be transparent to any computations which occur within the function.
This is called %2⊗→stack synchronization↔←%*.
Usually the effect of %3RET%1 is identical to %3POPJ%1, however it is
conceivable that we might expect that complex returns require
special care. The basic idea in this discussion is that we will
supply two similar, compatible, but not identical call-return sequences;
%3PUSHJ-POPJ%1 is fast and simple. The other, %3CALL-RET%1 is more general but more costly
to invoke.
In the next section we
will reconcile LISP primitives with the instruction set supplied
on %2SM%1.
.SS(Implementation of the Primitives,,P234:)
As with any representation problem, several choices are available.
We will begin our use of %2SM%* with a study of call-by-value function calls;
later we will discuss other calling sequences.
We will discuss two general implementation techniques. The first is applicable
to machines without the special %3AC%1's of the %2SM%1.
.P244:
First, we will assume only that we are able to simulate a stack. All the
operations occur on the stack. Constants will be generated by pushing the
representation on the top of the stack, essentially creating a %3dest%1 block.
A function call, %3f[t%41%3;#...#;t%4n%3]%1,
expects its arguments as the top %3n%1 elements of the stack,
with the value of %3t%4n%1 on the top of the stack, and the other values
below.
As the function is called, the %3dest%1 block on the top of the stack
becomes the local environment. The function replaces the top %3n%1 elements
with the value of the function, thus %3send%1-ing its value to the destination of
the caller.
This model is a restricted subset of LISP, but it is
a very useful subset.
It will develop into a more robust example as the chapter progresses.
The technique is extendible to support the implementation
model we developed in {yonss(P148)}.
.GROUP;
Here's an example of the implementation for the expression %3f[g[A];C;h[B]]%1:
.BEGIN TABIT2(10,45);SELECT 3;GROUP;
\ (PUSH P (QUOTE A))\%1; make argument for call on %3g
\ (CALL G)\%1; call the function%3
\ (PUSH P (QUOTE C))\%1; place second argument%3
\ (PUSH P (QUOTE B))
\ (CALL H)\%1; %3h%1 only uses (and removes) %3B
\ (CALL F)\%1; after the call, %3f[g[A];C;h[B]]%1 is
\\; on the top of the stack.
.END
.APART
Now we will give implementations of the LISP primitives which result
in reasonably efficient code on the %2SM%1, and which also
reflect several practices applied in current LISP implementations.
We will take advantage of the existence of the special %3AC%1's;
the usual hardware implementation of such special registers
allows access to their contents in less time than typical
stack references⊗↓There is a question whether such special registers
should be considered good architecture or a trick. The Burroughs
6700-7700 uses special hardware to decrease access time to the initial
stack segment. The PDP-10 uses special registers. One can
argue that such special tricks belong in the hardware, and that the
machine presented to the programmer be correspondingly more
uniform ⊗↑[Dor#76]↑.←.
Since the creating, saving, and restoring of destination blocks can be expensive,
we will try to minimize those kinds of activities. We will use
our special registers %3AC1%1, through %3ACn%1 to build parameter lists
and pass parameters. This entails
several conventions.
We will try to use the accumulators as the destination block.
Our early compilers will be sufficiently weak that this desire can
be met. Later we will have to modify our stand slightly.
The actual parameters for a function call,
%3f[t%41%3;#...#;t%4n%3]%1, will be developed in %3AC1%1 through %3ACn%1.
In the early compilers we will also pass the evaluated parameters
to the called function using the accumulators. Thus values will tend to stay
in the %3AC%1's unless forced out. They can be forced out by %3alloc%1
since a call to %3alloc%1 is supposed to save the current %3dest%1.
The interplay between %3next%1, %3link%1, and %3send%1 requires care.
.P235:
We will assume
that we are compiling for single valued functions, and therefore
we must resolve the question of where to put the value of a function.
Again consider
%3f[t%41%3;#...#;t%4n%3]%1; we might expect that each %3t%4i%1 be responsible
for placing its result in the proper %3ACi%1. Indeed that is the spirit
of the %3send%1 operation; it knows where the result should be placed.
This strategy requires some careful register allocation if it is to
be carried out successfully. We will postpone this discussion for a while.
There is a simpler solution available: a function always returns its
value in %3AC1%1 and leaves the register allocation up to the
calling function. There are at least two strategies here:
.GROUP;
.BEGIN CENTERIT;SELECT 2;
←Convention 1
.END
We try to build the %3dest%1 block in the %3AC%1's and also use the %3AC%1's
to pass parameters and values.
.BEGIN INDENT 10,10,10;
A function call, %3f[t%41%3; ...;t%4n%3]%1, expects
its arguments to be presented in %3AC1%1 through %3ACn%1.
We try to compute the values of %3t%4i%1 directly in %3ACi%1.
This is easy if %3t%4i%1 is a constant; if %3t%4i%1 is a function call
on %3g%1, we save %3AC1%1 through %3ACi-1%1; set up the arguments
to %3g%1; perform the call, returning the result in %3AC1%1; move the
result to %3ACi%1; and restore the saved values of the %3t%1's.
.END
.APART
.GROUP
.BEGIN CENTERIT;SELECT 2;
←Convention 2
.END
We try to build the %3dest%1 block in the top of the stack, using the
%3AC%1's for passing parameters and returning values.
.BEGIN INDENT 10,10,10;
A function call, %3f[t%41%3; ...;t%4n%3]%1, expects its arguments to
be presented in %3AC1%* through %3ACn%*.
As we compute each %3t%4i%1, we store
the result on the stack %3P%*.
Thus the execution sequence should be:
.END
.BEGIN centerit;
←compute value of %3t%4%1, push onto stack %3P%*.
← . . .
←compute value of %3t%4n-1%1, push onto stack %3P%*.
←compute value of %3t%4n%1, move into %3ACn%*.
.END
.APART
.BEGIN INDENT 10,10,10;GROUP
After this computation the values,
V%4n-1%*, ..., V%41%*,
of the arguments are stored
from top to bottom in %3P%* with V%4n%1 in %3ACn%1.
Thus to complete the function invocation, we need only pop the arguments
into the %3AC%*'s in the correct order and call %3f%1.
We did not push V%4n%1 since we expected to pass the parameters to %3f%1
in %3AC1%1 through %3ACn%1.
.END
.BEGIN INDENT 10,10,10;GROUP
.BEGIN CENTERIT;SELECT 2;
←General conventions
.END
.P174:
When a function completes evaluation, it is to place its value
in %3AC1%*. Nothing can be assumed about the contents any other %3AC%*. If an
%3AC%* contains information we need then it must be saved on the
stack %6before%* calling the function.
Instead of referring to %3AC1, AC2, ..., ACn%* we will simply
use the numbers, %31, 2,#...,#n%* in the instructions.
.END
We give an example of both conventions
for the expression %3f[g[A];C;h[B]]%1.
We use a list representation of the instructions and code sequences in preparation
for future discussions.
.BEGIN TABIT2(10,45);SELECT 3;GROUP;
\((MOVEI 1 (QUOTE A))\%1; make argument for call on %3g
\ (CALL G)\%1; call the function%3
\ (MOVEI 2 (QUOTE C))\%1; place second argument%3
\ (PUSH P 1)\%1; but now we have to save the values%3
\ (PUSH P 2)\%1; since we must compute %3h[B]
\ (MOVEI 1 (QUOTE B))
\ (CALL H)
\ (MOVE 3 1)\%1; move the result to %3AC3
\ (POP P 2)\%1; restore %3AC2%1 and %3AC1%1 in the correct order%3
\ (POP P 1)
\ (CALL F) )
.CENTER;SELECT 2;
Convention 1
.END
.BEGIN TABIT2(10,45);SELECT 3;GROUP;
\((MOVEI 1 (QUOTE A))\%1; make argument for call on %3g
\ (CALL G)\%1; call the function%3
\ (PUSH P 1)\%1; save the value%3
\ (MOVEI 1 (QUOTE C))
\ (PUSH P 1)\%1; save the value%3
\ (MOVEI 1 (QUOTE B))
\ (CALL H)
\ (MOVE 3 1)\%1; don't need to save the value%3
\ (POP P 2)\%1; since this is the last argument.%3
\ (POP P 1)
\ (CALL F) )
.CENTER;SELECT 2;
Convention 2
.END
Neither compiling convention produces optimal code for all occasions.
If the parameter list to a function call contains only constants,
then the first convention
produces better code. If there are many nested function calls then
it may produce very bad code. We will worry more about efficiency after we
develop the basic compiling algorithms.
At the highest level, our compiler will generate code for the
%3alloc-link-call-...%1 machine. But frequently we will express the
code in terms of one of our more traditional representations.
The output from the compiler
is to be a list of instructions, in the order which we
would expect to execute them. Each instruction is a list: an operation
followed by as many elements as are required by that operation.
We can execute the compiled code by simulating the actions of our machine
on each element of the sequence. However it is more efficient to
translate this compiler output further, producing a sequence of actual
machine instructions, placed in memory and suitable for execution by the
hardware processing unit.
We will allocate an area of memory which can receive the processed
compiler output.
This area is usually called %2⊗→Binary Program Space↔←%* (BPS).
The translation program
which takes the output from the compiler and converts it into actual
machine instructions in BPS is called an assembler.
.SS(Assemblers,Assemblers,P7:)
In {yonss(P230)} we gave an abstract description of an algorithm for
executing sequences of instructions.
In this section we discuss the mechanism for getting the LISP list, representing
instructions, turned into real instructions in Binary Program Space.
Part of the process involves the actual instructions;
before a machine can execute an instruction it must be transformed into a
numerical code which the machine understands.
Part of the process involves
establishing a link between the code in memory and the LISP evaluator;
when calling a routine which has been compiled, the evaluator must know where
to find it. Therefore, part of the process involves mapping the the labels to
locations in the machine.
There are two alternatives available to solve these problems.
We might incorporate these operations into the compiler. Then the
output from the compiler would go directly to locations in BPS.
We invoke this solution in {yonss(P240)} when we combine compiling
and interpreting. With the traditional approach, direct loading of compiled
code would require re-compilation whenever we wished to initiate a new
LISP process which needed that function.
The alternative is to compile the functions onto an external medium; then
we can load the compiled code at any time.
The premise here is that
compilation is an expensive operation relative to the cost of loading
compiled code.
In this alternative scheme, the tasks of loading, locating labels, and
linking the code with other modules, are all
accomplished by a program called an %2assembler%1. After the assembler has
placed the decoded assembly language in BPS, it indicates that the value
of the function name is the location of the assembled code.
One of the arguments to the assembler should be the representation
of the program. One of its arguments should also describe where in ⊗→BPS↔←
we wish the assembled code to be located. We should also have access
to an initial symbol table, describing the pre-defined
symbol names. These pre-defined names might include information
about the actual machine locations for the utility functions, the values
of special stacks or registers which the compiler uses internally, and
it must also include a special list giving a correspondence between
the names like %3ALLOC%1 or %3PUSHJ%1 and the actual numbers which the hardware
uses in interpreting the instruction⊗↓A hardware machine is just another
interpreter like %3eval%1. It is usually not recursive, but performs more
like %3loop%1 in the %3prog%1-evaluator.←.
The assembler can go %3rest%*-ing down the program list, looking up
definitions and manufacturing the numerical equivalent of each
instruction, then depositing that number in the appropriate machine
location.
.GROUP
Below is a low level representation of a code sequence for
%3f[g[A];h[B]]%* assuming that %3h%1 is a primitive routine.
.P169:
It might be a list of the form:
.BEGIN TABIT2(10,45);SELECT 3;
\((MOVEI 1 (QUOTE A))\%1; make argument for call on %3g
\ (CALL G)\%1; call the function%3
\ (PUSH P 1)\%1; save the value%3
\ (MOVEI 1 (QUOTE B))
\ (PUSHJ P H) ⊗↓%1Note that we use %3P%1; this assumes we use %3P%1 to save %6both%1 value and control information.←%3
\ (MOVE 2 1)
\ (POP P 1)
\ (CALL F) )
.END
.APART
%1
The machine representations of these instructions are encodings of
specific fields of specific machine locations with specific numbers.
For example, the operation %3PUSH%* is represented as a certain number,
called its %2⊗→operation code↔←%* or %2op code%*, and which will
occupy a certain area of a machine word so that the CPU can interpret
it as an instruction to push something onto a stack. Other fields in the
instruction are to be interpreted as references to stacks, to memory locations,
to accumulators, constants or external references to other routines.
The purpose of an assembler is to translate these mnemonic instructions
into machine instructions.
Essentially all that the assembler need do is search ⊗→symbol tables↔←
for the opcodes, for subroutine names, for accumulator and stack names,
and store the resulting values in the appropriate machine locations.
Things are slightly more complex: later we must also %6add%* information to
the tables.
We must exercise a bit of care in handling %3QUOTE%*d expressions.
Assembling a construct like %3(MOVEI 1 (QUOTE (A B C)))%* should
have the effect of constructing the list %3(A B C)%* in free space
and placing an instruction in memory to load the address of this list into %3AC1%*.
What we must notice is that this list %3(A B C)%* is subject to
garbage collection and, if left unprotected, could be destroyed.
There are a couple of solutions. Perhaps the garbage collector could
look through compiled code for any references to free-space or full-word-space;
or we could make a list of all of these constants and let the garbage
collector mark the list.
Looking through compiled code is expensive; keeping a %3QUOTE%*-list
is a reasonable compromise. It is a compromise since that strategy
might retain unnecessary structures in case functions were redefined or
recompiled.
The assembler also needs to recognize that there are different instruction formats.
That is, some instructions use an opcode and a memory reference: %3(JUMP#L)%*;
some use an opcode, accumulator, and an address: %3(PUSH#P#1)%*; and some
use a LISP construct: %3(MOVEI#1#(QUOTE A))%*.
Therefore, the assembler
has to have an initial symbol table of opcodes and
stack numbers.
.BEGIN TABIT2(35,45);GROUP;
Here is a sample op-code table with their machine equivalents:
%2\symbol\value%3
\MOVE\200
\MOVEI\201
\SUB\274
\JUMP\254
\JUMPE\322
\JUMPN\326
\PUSH\261
\POP\262
\PUSHJ\260
\POPJ\263
\RET\263
\CALL\034
\P\14
.END
And here's what the above example might resemble after being assembled:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;group;
.TURN OFF "%","#","α";
.TURN ON "∞" FOR "%";
⊂αααπααπααααα⊃
100 ~201~ 1~ 405~
εαααβααβαααααλ
~034~ ~ 1107~
εαααβααβαααααλ
~261~14~ 1~
εαααβααβαααααλ
~201~ 1~ 406~
εαααβααβαααααλ
~260~14~11121~
εαααβααβαααααλ
~200~ 2~ 1~
εαααβααβαααααλ
~262~14~ 1~
εαααβααβαααααλ
~034~ ~ 1051~
%ααα∀αα∀ααααα$
.END
where %3A%1 is located at %b405%1; the atom %3F%1 begins at %b1051%1,
and the instruction sequence for %3h%1 begins at %b11121%1, etc.
.SS(Compilers for Subsets of LISP)
We will examine compilers for increasingly complex subsets of LISP
beginning with functions, composition and constant arguments and
ending with a more realistic compiler for a reasonable subset of
pure LISP. Though each subset is a simple extension of its predecessor,
each subset introduces a new problem to be solved by the compiling algorithm.
If the corresponding evaluator (⊗→%3tgmoaf%*↔←, ⊗→%3tgmoafr%*↔←, and the most
simple ⊗→%3eval%*↔←) is well understood, then the necessary steps
to be added to the compiler are easy to make.
The first compiler will handle representations of that
subset of LISP forms defined by the following BNF equations:
.BEGIN TABIT1(11);GROUP
<form>\::= <constant> | <function>[<arg>; ...; <arg>]
<arg>\::= <form>
<constant>\::= <sexpr>
<function>\::= <identifier>
.END
This LISP subset corresponds closely to that of ⊗→%3tgmoaf%*↔←, handling only
function names, composition, and constant arguments.
In the interest of readability and generality, we will write the functions using
constructors, selectors, and recognizers and supply the necessary
bridge to our specific representation by simple sub-functions.
All the compilers we develop will be derived from the second compiling
convention, saving the results on the top of the stack. Our compilers will
incorporate some knowledge of the %2SM%1 machine, and we will try to
note places where substantial assumptions about machine stucture have been
made.
.GROUP
.FILL
%3compexp%1 expects to see either a constant or a function followed by a
list of zero or more arguments. The appearance of a constant should
elicit the generation of a list containing a single instruction to %3send%1
back the representation of that constant; %3mksend[dest;exp]%*
is a call on the constructor to generate that instruction.
Since values are always found in %3AC1%1, that should
be the destination for the %3send%1. Since we are assuming the expression is a
constant, the operation can be a %3MOVEI%1.
If the expression is not a constant,
we can assume it is a call-by-value application. We should
generate code to evaluate each argument,
and follow that with code for a function call.
.BEGIN SELECT 3;TABs 11,18,21,35;turn on "\";nofill;
%3compexp <=\λ[[exp]\[isconst[exp] → list[mksend[1;exp]];
\\ %et%* →\λ[[z] compapply[\func[exp];
\\\\complis[z];
\\\\length[z]]]
\\\ [arglist[exp]] ]]]%1
.END
.APART
.GROUP
.FILL
%3complis%1 gets the representation of the argument list; it must
generate a code sequence to evaluate each argument and increment the destination.
After we have compiled the last argument we should not increment the
destination.
Notice that we have used %3append%* with three
arguments; this could be justified by defining %3append%* as a macro ({yonss(P8)}).
.BEGIN SELECT 3;TABIT2(14,25);
%3complis <= λ[[u]\[null[u] → ( );
\ null[rest[u]] → compexp[first[u]];
\ %et%* → append[\compexp[first[u]];
\\list[mkalloc[1]];
\\complis[rest[u]]]] ]
.END
.APART
.GROUP
.FILL
%3compapply%1 has a simple task:
it generates code for allocation of the values; it takes the list of instructions
made by %3complis%* and adds instructions at the end of the list
to generate a function call on %3fn%*.
Here's %3compapply%*:
.BEGIN SELECT 3;TABIT1(30);
.P82:
%3compapply <= λ[[fn;vals;n] append[\vals;
\mklink[n];
\list[mkcall[fn]]]]
.END
.APART
.SELECT 1;
Finally, here are the constructors, selectors, and recognizers:
.BEGIN TABIT1(16)GROUP;
.P242:
%2Recognizer%3
isconst <= λ[[x] or\[numberp[x];
\ eq[x;%et%3];
\ eq[x;%ef%3];
\ and[not[atom[x]];eq[first[x];QUOTE]]]]
%2Selectors%3
func <= λ[[x] first[x]]
arglist <= λ[[x] rest[x]]
%2Constructors%3
mksend <= λ[[dest;val] list[MOVEI;dest;val]]
mkalloc <= λ[[dest] list[PUSH;P;dest]]
mkcall <= λ[[fn] list[CALL;fn]]
mklink <= λ[[n][eq[n;1] → ( ); %et%3 → concat[mkmove[n;1];mklink1[sub1[n]]]]
mklink1 <= λ[[n][zerop[n] → ( ); %et%3 → concat[mkpop[n];mklink1[sub1[n]]]];
mkpop <= λ[[n] list[POP;P;n]]
mkmove <= λ[[dest;val] list[MOVE;dest;val]]
.END
Note that %3compexp%* is just a
complex %9r%*-mapping whose image is a sequence of machine language instructions.
The code generated by this compiler is inefficient, but that
is not our main concern. We wish to establish an intuitive and
correct compiler, then
worry about efficiency. Premature concern for efficiency
is folly; we must first establish a correct and clean algorithm.
.CENT(Problems)
I. Write %3compexp%1 to generate code for %2option 1%1 as discussed
on {yon(P235)}. Compare the two versions of %3compexp%1;
now write a more abstract version which encompasses both as special cases.
II. Write %3compexp%1 and associated functions for a stack-only machine
using the techniques outlined on {yon(P244)}.
.SS(Compilation of Conditional Expressions)
The innovation which occurred in %3tgmoafr%* was the introduction of conditional
expressions. To describe conditional expressions, the BNF equations
were augmented by:
←<form> ::= [<form> → <form> ; ... ;<form> → <form>]
The addition of conditional expressions will mean an extra
piece of code in %3compexp%*
to recognize %3COND%* and a new function (analogous to %3evcond%* in %3tgmoafr%*)
to generate the code for the %3COND%*-body⊗↓If we had designed the compiler
like the
evaluators in {yonss(P237)} we would need only attach a compile-property
to the atom %3COND%1, and make the property-value %3COMPCOND%1.←.
In fact, the only difference between %3evcond%* and its counterpart in
%3compexp%*, which we shall call %3comcond%*, is that %3comcond%* generates
code which when executed will produce the same value as the value
produced by %3evcond%* operating on the given S-expression.
The effect of %3comcond%* on a conditional of the form:
.P103:
%2COND%*←%3(COND%* %3(%eR%f(%3 p%41 %f)%3 %eR%f(%3 e%41%f)%3 ) ... (%eR%f(%3 p%4n%3 %f)%3 %eR%f(%3 e%4n %f)%3))
%1 follows from the discussion of %3receive_test%1 on {yon(P236)}.
First generate code for p%41%*; then generate a test for truth, going to the
code for e%41%* if true, and going to the code for p%42%* if not true.
The code for e%41%* must be followed by an exit from the code for the
conditional, and we should generate an error condition to be executed
in the case that p%4n%* is false.
.BEGIN NOFILL;GROUP;
We represent the code as:
.BEGIN SELECT g;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#";
.TURN ON "∞" FOR "%";
∞1<code for p∞41∞1>∞g
/\
/ \
∞fG∞* ∞fH∞*
T NIL
/ \
∞fG∞* ∞fH∞*
∞1<code for e∞41∞1>∞g ∞1<code for p∞42∞*>∞g
∞fG∞* ∞fGH∞*
/ T NIL
∞fG∞* ∞fG∞* ∞fH∞*
/ ∞1<code for e∞42∞1>∞g ∞1<code for p∞43∞*>∞g
∞fG∞* / ∞fGH∞*
∞fH∞* ∞fG∞* ... ...
\ / ∞fG∞* ∞fH∞*
∞fH∞* ∞fG∞* T NIL
\ / / \
\ / ∞fG∞* ∞fH∞*
∞fH∞* ∞fG∞* ∞1<code for e∞4n∞1>∞g ∞1<code for error∞*>∞g
\ / ∞fG∞*
∞fH∞* ∞fG∞*
... /
\ /
∞fH∞* ∞fG∞*
.END
.END
We will express %3comcond%1 in terms of %aSM%1 primitives.
This requires the establishment of more conventions for our compiler,
since we must implement %3receive_test%1.
.GROUP
.BEGIN CENTERIT;SELECT 2;
←More Compiling Conventions
.END
.BEGIN INDENT 10,10,10;
We must be able to test for %et%* (or %ef%*).
Previous conventions imply that the value of a predicate will be found in %3AC1%1.
We can test for the occurrence of %et%1 or
%ef%* using the %3JUMPT%1 or %3JUMPF%* instruction
(see {yonss(P33)}) respectively⊗↓Note
that this implementation maps everything not %ef%* onto
%et%1. Thus any value other than
%ef%* will be considered %et%*. See {yonss(P214)}.←.
.END
.APART
We can implement %3receive_test%1 using %3JUMPT%1 or %3JUMPF%1, and
we can implement %3goto%1 using %3JUMP%1.
Since our code is to be a %6sequence%* of instructions,
we must linearize the graph-representation of the generated code.
We can generate a sequence representation by appropriately interspersing labels
and %3JUMP%1s between the blocks of instructions for the p%4i%1's and e%4i%1's.
We will generate:
.BEGIN TABIT2(30,35);GROUP
.P238:
\%3(\%1<code for p%41%*>%3
\\(JUMPF 1 L1)%1
\\<code for e%41%*>%3
\\(JUMP L0)
\L1\%1<code for p%42%*>%3
\\(JUMPF 1 L2)
\ ...
\Ln-1\%1<code for p%4n%*>%3
\\(JUMPF 1 Ln)%1
\\<code for e%4n%*>%3
\\(JUMP L0)
\Ln\(JUMP ERROR)
\L0\ )
.END
%1
We need to construct the labels, %3Li%*. These labels should be
atomic and should be distinct. LISP has a function, ⊗→%3gensym%*↔←, which
can be used for this task. %3gensym%* is a function of no arguments whose
value is a distinctive symbol called a generated symbol, or "gensym".
Gensyms are not true atoms since they are not placed in the object list;
they are usually used only
for their unique name. If it is desired to use them as atoms,
they must be placed on the object list using
the function %3intern%1 ({yon(P277)}). Gensyms are distinct from each
other and will be distinct from all other atoms unless you read an atom
with that pname⊗↓In many versions of LISP, gensyms are of the form
%3Gn%* where %3n%* is a four digit number beginning at %30000%*. Thus
the first call of %3gensym[ ]%* would give %3G0000%*; succeeding calls would give
%3G0001, G0002%*, etc.←.
We want to write a recursive version of %3comcond%*; therefore
we must determine what code gets generated on each
recursion and what code gets generated at the termination case.
.GROUP;
Looking at the example, we see that the block
.BEGIN CENTERIT;
←%3(%1<code for p%4i%*> %3(JUMPF 1 Li) %1<code for e%4i%1> %3(JUMP L0) Li)%1
.END
is the natural segment for each recursion and that:
.BEGIN CENTERIT;
←%3((JUMP ERROR) L0)%1
.END
is to be expected for the termination case.
Within each
block we need a "local" label, %3Li%1; and within each block,
including the terminal case, we refer to the
label %3L0%* which is "global" to the whole conditional.
We can now add the recognizer for
%3COND%* to %3compexp%* and construct %3comcond%*.
.APART
.BEGIN SELECT 3; TABIT2(19,30);TURN OFF"←";GROUP;
%1Add the clause:
.BEGIN CENTER;SELECT 3;
iscond[exp] → comcond[args%4c%3[exp];gensym[ ]];
.END
%1to%3 compexp %1where:
%3
comcond <= λ[[u;glob]\[null[u] → list[mkerror[ ];glob];
\ %et%* → append[\comclause[first[u]; gensym[];glob];
\\comcond[rest[u]; glob] ]
.END
.BEGIN SELECT 3; TABIT1(28);TURN OFF"←";GROUP;
comclause <=λ[[p;loc;glob] append\[compexp[ante[p]];
\ list[mkjumpf[loc]];
\ compexp[conseq[p]];
\ list[mkjump[glob];loc] ]]
.END
.BEGIN SELECT 3;GROUP;NOFILL;
%2Recognizer%3
iscond <= λ[[x] eq[first[x]; COND]]
%2Selectors%3
args%4c%3 <= λ[[x] rest[x]]
ante <= λ[[c] first[c]]
conseq <= λ[[c] second[c]]⊗↓%1This definition of %3conseq%1
does not allow extended conditional expressions. See problem II on {yon(P270)}.←%3
%2Constructors%3
mkerror <= λ[[] (JUMP ERROR)]
mkjumpf <= λ[[l] list[JUMPF;1;l]]
mkjump <= λ[[l] list[JUMP;l]]
.END
.SELECT 1;
.BEGIN GROUP;TABIT2(30,38);
The partially exposed recursive structure of %3comcond%* would show:
←%3comcond[((%1p%41%* e%41%3) ...(%1p%4n%* e%4n%3));G0000]=
\%3(%1\<code for p%41%*>%3
\\(JUMPF 1 G0001)%1
\\<code for e%41%*>%3
\\(JUMP G0000)
\ G0001\comcond[((%1p%42%* e%42%3) ...(%1p%4n%* e%4n%3)); G0000])
.END
.SELECT 1;
We need to extend our assembler to handle the generated labels and jumps which
appear in the conditional expression code.
.CENT(Problems)
.BEGIN TABIT2(12,28);
I. Evaluate:\%3compexp[(COND\((EQ (QUOTE A)(QUOTE B))(QUOTE C))
\\((NULL (QUOTE A))(QUOTE FOO)))]
.END
.P270:
II. Extend %3comcond%1 to handle extended conditional expressions.
.SS(One-pass Assemblers and Fixups)
The compiled instructions for conditional expressions
adds one more task for our assembler:
we must handle the generated label constructs.
Recall the pattern for conditional expression code given on {yon(P238)}.
The code sequence consists of representations of instructions and labels.
The symbols, %3L0, L1,%* and %3L2%* in that example are
generated symbols, representing labels. Though the gensyms are %6not%1
true atoms, they %6will%1 satisfy the test for %3atom%1. Therefore
we can recognize an occurrence of a label
using the predicate %3atom%*.
If the assembler recognizes a label definition, it should
add information to a symbol table, noting that the label has
been seen and that it is associated with a specific location in memory.
Further references to that label can be translated to references
to the associated machine location. The only problem is that references
to labels may occur %6before%* we have come across the label definition
in the program. Such references are called %2⊗→forward reference↔←s%*.
For example, all references in the %3COND%1-code are forward
references⊗↓If we scan the instruction sequence in the order in which the
code would be executed, we always refer to a label before we
come across its definition. We could skirt the forward reference
problem by loading the program in reverse order: rather than beginning with
the first instruction and loading %6upward%1 in memory, we could
begin with the last instruction and load downward. However, this
would only be a temporary expedient: an assembler must also handle
%3prog%1s. The label structure of %3prog%1s is not restricted to such
predictable behavior.←.There are two solutions to the
⊗→forward reference↔← problem:
.BEGIN INDENT 0,3;
%21.%* Make two passes through the input program. The first pass
builds a symbol table of labels describing
where the labels will be assigned in memory.
A second pass
uses this symbol table to actually assemble the code into the machine.
%22.%* The other solution is to make one, more complex, pass through the input.
Whenever we come across a forward reference we add information to the symbol table
saying that a forward reference has occurred at this location. We assemble
as much of the instruction as we can. When a label definition %6does%*
occur, we check the table to see if there are any forward references pending
on that label. If there are, we %2⊗→fix-up↔←%* those instructions to
reference the location now assigned to the label.
.END
There are some restrictions which are imposed on one-pass assemblers, but
particularly for assembling compiled code, one-pass assemblers are usually
sufficient and are quite fast.
.<<FIXUPS>>
There are at least two ways to handle fixups of forward references. If
the fixups are of a particularly simple nature, say only requiring fixups
to the address-part of a word, then we may link those pending forward references
together, chaining them on their, as yet, un-fixed-up field.
.BEGIN GROUP
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;group;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
⊂αααααπααααα⊃
~ ~ ≤' ~←α⊃
εαααααβαααααλ ~
~ ~ ~ ~
εαααααβαααααλ ↑
~ ~ ~ ~
εαααααβαααααλ ~
~ ~ #ααβαα$
~ ~ ~←α⊃
εαααααβαααααλ ~
~ ~ ~ ↑
εαααααβαααααλ ~
~ ~ ~ ~
εαααααβαααααλ ↑
~ ~ #ααβαα$
~ ~ ~←α⊃
εαααααβαααααλ ~
~ ~ ~ ↑
εαααααβαααααλ ~
~ ~ ~ ↑
εαααααβαααααλ ~
~ ~ ~ ~
εαααααβαααααλ ↑
pointer from αααααα→ ~ ~ #ααβαα$
entry in object list %ααααα∀ααααα$
.END
.begin centerit select 2;
←A Simple Fixup Scheme
.END
.END
Each time a forward reference is seen it is added to the linked list;
when the label is finally defined and given an address in memory, then the
address replaces each reference link. No extra storage is used since the
linked list is stored in the partially assembled code.
Another solution, which is potentially more general (that is, it could
handle arbitrary partial-word fixups) is to store the information
about each fixup in the symbol table under each label.
.BEGIN GROUP
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;group;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
from object list entry
~ ⊂αααααπααα⊃ ⊂αααααπααα⊃ ⊂αααααπααα⊃
%αα→~ # ~ #αβαα→~ # ~ #αβαα→~ # ~ . . .
⊂ααααααααααα⊃ %ααβαα∀ααα$ %ααβαα∀ααα$ %ααβαα∀ααα$
~ ~ ↓ ↓ ↓
εαααααααααααλ ~ ~ ~
~ ~←ααααα$ ~ ~
εαααααααααααλ ↓ ↓
~ ~ ~ ~
εαααααααααααλ ~ ~
~ ~←ααααααα←ααααααα←ααα$ ~
εαααααααααααλ ↓
~ ~←ααααααα←ααααααα←ααααααααα←ααααααα$
εαααααααααααλ
~ ~
ε . . . λ
.END
.BEGIN CENTERIT; SELECT 2;
←Another Fixup Scheme
.END
.END
In this scheme we could store additional information with each link in the
list. Thatinformation could tell the fixup routine how to modify the
designated location.
.GROUP;
Both methods are useful. Both have been used extensively in assemblers and
compilers. We now sketch a simple one-pass assembler, named %3assemble%1.
To describe %3assemble%*, we will need two functions:
.BEGIN INDENT 0,13;
%21.%* ⊗→%3deposit%*↔←%3[x;y]%*: %3x%* represents a machine address; %3y%* is a list,
representing the instruction to be deposited. %3y%* could be a list of
elements: %3(%*opcode, accumulator number, memory address%3)%*
The value of %3deposit%* is the value of %3y%*.
.END
.APART
.BEGIN INDENT 0,12;group;
%22.%* ⊗→%3examine%*↔←%3[x]%*: %3x%* represents a machine address. The value
of %3examine%* is the contents of location %3x%* in the form of a list as
specified above.
.END
We can now use our fixup mechanism, combined with
%3examine%*, %3deposit%*, and %3putprop%* and
%3remprop%* from
{yon(P52)} to write the parts of the assembler which worry about forward
references and labels.
We will use the second fixup scheme. If the label has been assigned a location
then the property list of the label will contain the indicator %3SYM%* and an
associated value representing the assigned location.
If the label has %6not%* been previously defined but has been referenced then
the atom will have an indicator %3UNDEF%*; the value-part will be a list
of all those locations which reference that label. Since we will only
be doing simple fixups, this will be sufficient. The contents of any location
referenced from such a fixup list will be a partially assembled word⊗↓which may
be instruction
or data← with the memory address portion set to 0.
When the label finally is defined we must perform the fixups,
delete the %3UNDEF%* pair, and add a %3SYM%* pair.
There are two main functions.
%3defloc%* is called when a label has been defined; if there are no pending forward
references then the %3SYM%* pair is simply added, otherwise the fixup mechanism is
exercised.
When a label is referenced, %3gval%* is called. If the label is already defined
then it simply returns the %3SYM%* value; otherwise it adds a forward reference to the
list.
.BEGIN SELECT 3;GROUP;TABIT3(15,22,29);TURN OFF "←";
defloc <= λ[[lab;loc] prog[[z]
\\[null[z ← get[lab;UNDEF]] → go[a]]
\fixup\deposit[\car[z];
\\\fixit[examine[car[z]];loc]];
\\[z ← cdr[z] → go[fixup]]
\\remprop[lab;UNDEF];
\a\return[putprop[lab;loc;SYM]]
.END
.BEGIN SELECT 3;GROUP;TABIT1(13);
gval <= λ[[lab]\[get[lab;SYM];
\ %et%* → putprop[lab;cons[loc;get[lab;UNDEF]];UNDEF]; 0]]
.END
%3fixit <= λ[[x;l] mkinstr[op[x];ac[x];add[x];l]]%*
Notes: these functions use lots of tricks.
.BEGIN INDENT 10,14;GROUP;
%21.%1 In %3defloc%1 we use %3get%1 as a predicate, relying on our
convention that a non-%3NIL%1 value represents truth ({yonss(P214)}).
%22.%1 In that same conditional, we also rely on the fact that the value
of an assignment statement is the value of its right hand side. We appeal to
points %21%1 and %22%1 in the second conditional of %3defloc%1.
%23.%* In %3gval%1, there is no e%41%*; recalling ({yonss(P214)}) that
if p%41%* evaluates to something non-%3NIL%*,
then that value is the value of the conditional expression.
%24.%* We also use an extended conditional in %3gval%1, executing
the %3putprop%* and then returning %30%*.
%25.%1 Note also that %3loc%* is a non-local variable in %3gval%1.
.END
.SS(Compiling and Interpreting,,P240:)
Before adding more LISP constructs to our compiling algorithm
we should reexamine the relationships between compilers and interpreters.
The compilation of conditional expressions introduces an interesting
dichotomy between the action of an interpreter and that of a typical compiler.
We will restrict ourselves to a simple
form of the conditional expression:### %3if[p;then;otherwise]%1,##
where %3p%1 is a an expression giving %et%1 or %ef%1;
%3then%1 is the expression to be evaluated if %3p%1 gives %et%1; otherwise
the expression, %3otherwise%1,
is to be evaluated. It is an easy exercise to express a LISP conditional in terms
of a nested %3if%1 expression.
When an interpreter evaluates conditional expression or
an %3if%1, it will
evaluate either %3then%1 or %3otherwise%1; not both. When a compiler compiles
code for an %3if%1 expression, it compiles %6both%1 branches.
This loss of symmetry is disconcerting. Certainly, we cannot only compile
%6one%1 branch of the %3if%1 and claim that we have faithfully
translated a conditional; however, compiling code for program segments
which may not be executed is also disconcerting.
For example, if a particular evaluation %6never%1 takes the %3otherwise%1 branch
of a conditional⊗↓That does %6not%1 imply that the %3otherwise%1-branch will
%6never%1 be visited.←, then we need not compile code for that branch.
At a later date, a different evaluation might take that branch, and at
that time, we should be able to compile it.
We will discuss more
implications of this behavior in {yonss(P239)} when we examine
the interactive programming aspects of LISP.
This section will deal with the mechanical aspects of reconciling
compilers with interpreters.
We will show that it is possible to interpret %6and%1 compile
at the same time⊗↓See ⊗↑[Mit#70]↑ for a similar idea
applied to a different language.←. The relevant observation is that much of the
compiling and interpreting algorithms are identical; they deal with
decoding the input expression and the understanding of which LISP constructs
are present. It is only after the interpreter or compiler has discovered
the nature
of the expression that the specifics of compilation or interpretation
come into play.
We will build an evaluator/compiler named %3evcom%1 based on the
explicit access evaluator of {yonss(P209)}.
It will handle compilation and interpretation of the application
of both primitive functions and named λ-definitions; it will
recognize the difference between local and non-local variable references,
compiling (and executing) calls on %3lookup%1 for non-local references,
and use a faster relative addressing technique for local variables.
Finally it will "incrementally compile" %3if%1 expressions,
executing (and generating code for) the branch designated by the
predicate; and will leave sufficient information available such that
if ever the other branch is executed, then code is compiled for it.
Before sketching %3evcom%1, one other implementation detail is worth mentioning.
We cannot simply replace a LISP expression with compiled code; LISP expressions
are data structures and we must be able to operate on that data structure
representation without being aware that a compilation has occurred.
For example the LISP editor ({yonss(P247)}) must be able to manipulate
the S-expr representation.
So rather than replace expressions with code we %6associate%1 the code with the
expression using an association list whose name-entries are LISP expressions
and whose value-entries are sequences of instructions⊗↓In
actual practice, such a representation would be prohibitively expensive
and we would substitute a hash array technique; see {yonss(P248)}.←.
The variable %3code%1 is used to hold the association list or code
buffer.
Finally here is the sketch. We have left out many of the subsidiary functions
and have left out all of the execution mechanism involved in %3xct%1
and %3execute%1;
%3xct%1 executes a single instruction and %3execute%1
basically is a combined assembler and execution
device.
.BEGIN SELECT 3;GROUP;NOFILL;TURN ON "\";TABs 15,21,32,40;turn off "←";
evcom <= λ[[exp]prog[[z]
\return[\[z ← hascode[exp] → execute[z];
\\ isconst[exp] → xct1[list[mksend[exp]]];
\\ isvar[exp] → \[z ← islocal[exp] → xct1[send_code[list[mklocal[z]]];
\\\ z ← isfun[exp] → send_code[evcom1[def[z];()]];
\\\ issubr[exp] → send_code[list[mkpushj[exp]]];
\\\ %et%3 → xct1[send_code[list[mkglob[exp]]]]];
\\ isif[exp] → send_code[evif[exp]];
\\ %et%3 → send_code[mkcode[\xct1[list[mkalloc[vars[fun[exp]]]]];
\\\\evcomlist[args[exp]];
\\\\xct1[list[mkcall[fun[exp]]]]] ]]] ]]
evcom1 <= λ[[exp;code] evcom[exp]]
xct1 <= λ[[x] xct[x]; x]
.END
Here's the essence of %3evcom%1: if the expression has code in the current
code buffer, then we execute it without further processing. A constant
is executed and produces code, since that constant may be a subexpression
of a larger expression being compiled; we do not save the constant code in the
code buffer. Two types of variables are recognized: a local variable is
recognized by its presence in the local table; a relative address reference
can be given for that entry. More will be said about rapid access to
local variables in {yonss(P32)} and {yonss(P249)}. If the variable
reference is non-local, then we compile a version of %3lookup%1; the actual
form of that code will depend on the binding implementation (shallow or deep).
Either type of variable reference saves the code using %3send_code%1.
If the variable reference is to a function name, then we will have gotten
here through a function application. We must compile and execute code for that
function definition. We use the function %3evcom1%1 since the code buffer,
%3code%1, must be re-initialized within the compilation of the function body
since code in the outer environment won't be valid within the function body.
Finally, the variable might be a reference to a primitive function, in which case
we just return the call and let the function application execute it.
The %3if%1 statement is discussed below.
If the expression is an application, we generate (and execute) a sequence
of code to allocate space, compile and execute the argument list, and
if necessary compile, but always execute the function call.
The code sequence is constructed using %3mkcode%1; you may think of
%3mkcode%1 as %3append%1.
.BEGIN SELECT 3;GROUP;NOFILL;TURN ON "\";TABs 22,27;turn off "←";
hascode <= λ[[exp] prog[[z]
\return[\[z ← findcode[exp] → cdr[z];
\\ %et%3 → %ef%1]]]]
.END
.BEGIN SELECT 3;GROUP;NOFILL;TURN ON "\";TABs 15,25
evcomlist <= λ[[l]\[null[l] → ();
\ null[rest[l]] → evcom[first[l]];
\ %et%3 → mkcode[\evcom[first[l]];
\\xct1[((NEXT))];
\\evcomlist[rest[l]]]]]
.END
.BEGIN SELECT 3;GROUP;NOFILL;TURN ON "\";TABs 18,27;turn off "←";
evif <= λ[[exp] prog[\[l p a b]
\l ← body[exp];
\p ← pred[l];
\a ← ante[l];
\b ← ow[l];
\p ← evcom[p];
\return[list[\[receive[] → mkifa[exp;p;evcom[a];b];
\\ %et%3 → mkifb[exp;p;a;evcom[b]]]]] ]]
.END
The compilation and execution of %3if%1 expressions is interesting.
When compiling the first reference to an %3if%1 instance, we compile
the predicate and one of the branches; we associate a structure
with the instance; that structure has either the name %3ifa%1 or %3ifb%1
depending on which branch was compiled. If we come across this instance
of %3if%1 again (either in a loop or in a recursion) then we find
the %3ifa%1 or %3ifb%1 entry in %3code%1. If we pick the same branch of the
%3if%1 then nothing new happens; but if the (compiled) predicate
evaluated to the other truth value, then we compile the other branch and
associate a completely compiled program with the original %3if%1 expression.
The construction of the completed code is the business of the next
function, %3mkifcode%1.
.BEGIN SELECT 3;GROUP;NOFILL;TURN ON "\";TABs 17,30;turn off "←";
mkifcode <= λ[[p;a;b]
\λ[[l;l1] mkcode[\p;
\\list[mkjumpf[l]];
\\a;
\\list[mkjump[l1]];
\\list[l];
\\b;
\\list[l1]]]
\ [gensym[];gensym[]] ]
.END
.BEGIN SELECT 3;GROUP;NOFILL;TURN ON "\";TABs 17,32;turn off "←";
.END
.BEGIN TABIT1(16)GROUP;
%2Recognizers%3
islocal <= λ[[x]in[x;local[env]]]
isif <= λ[[x] eq[first[x] IF]]
isprim <= λ[[ins] get[ins;INST]]
isfun <= λ[[x] get[x; EXPR]]
%2Constructors%3
mklocal <= λ[[var] list[SEND@;var]]
mkglob <= λ[[x] list[LOOKUP;x]]
mkalloc <= λ[[vars] list[ALLOC;vars]]
mkcall <= λ[[fn] list[CALL;fn]]
.END
We have left out a significant amount of detail and we have only covered a
subset of LISP, but the result should be understandable; and it
should further clarify the relationships between compilation and
interpretation. Typical discussions of compilers and interpreters
lead one to believe that there is a severe flexibility/efficiency
tradeoff imposed in dealing with compilers. If you compile programs
you must give up a lot of flexibility in editing and debugging.
With a properly designed language and machine that is not true.
.SS(A compiler for Simple %3eval%*: The Value Stack,compiler,P32:)
We wish to add more details to our basic compiling algorithms. For simplicity,
we return to the pure compiler. We develop a more complex
compiler, leaving the details of a complex compiler/interpreter to the reader.
The major failing of the previous %3compexp%*
({yonss(P7)}) is its inability to handle
variables.
A related failing is its inability to
compile code for λ-definitions. We will alleviate both difficulties in this
section.
From {yon(P169)}, we know what %3compexp%* will do with:
←%3f[g[A];h[B]]%*
.BEGIN GROUP
.TABIT2(10,45);SELECT 3;
\(MOVE 1 (QUOTE A))\%1; get %3A%* into %31.
\(CALL G)\%1; call the function named %3g
\(PUSH P 1)\%1; save the value%3
\(MOVE 1 (QUOTE B))\%1; get %3B%1 into %31
\(PUSHJ P H)\%1; call %3h
\(MOVE 2 1)\%1; restore the arguments in%3
\(POP P 1)\%1; preparation for%3
\(CALL F)\%1; calling %3f
.END
No suprises yet. What would we expect to see for a compiled version of:
←%3f[g[x];h[y]]%* ?
We %6should%* expect to see the same code except that we would have
instructions to send the values of %3x%* and %3y%* into %31%* at the
appropriate time. So the first problem is
how to find the values of variables.
Assume we are really interested in compiling:
←%3j <= λ[[x;y] f[g[x];h[y]]]%*
This added problem makes our question easier. Consider a call on %3j%*: %3j[A;B]%*,
for example. We know that the execution of the call occurs after the values %3A%* and
%3B%* have been set up in %3AC1%* and %3AC2%*.
Thus at %6that%* time we do indeed know what the
values of %3x%* and %3y%* are supposed to be.
For sake of simplicity, assume that the variables %3x%1 and %3y%1 are
strictly local. That is,
no one within the bodies of
either %3g%* or %3h%* uses %3x%* or %*y%* free; we will worry about
compilation of free references later.
Since they are local, then only %3j%1 needs to find their values.
We cannot leave the values in the %3AC%1s since those registers are
needed for other computations.
Rather, we will save %3x%1 and %3y%1 in the top of the stack %3P%*.
Since %3P%1 contains the
values of partial computations, and now also contains the values of the
local λ-variables, %3P%* is also called the %2⊗→value stack↔←%*. This is a
value stack similar to that
described in deep-binding ({yonss(P148)}); however we do not need the name stack
here. This is because the compiler will %6know%* where on the stack values of
local variables can be found. It will put them there so it %6should%* know.
This lack of a name stack is a mixed blessing; we save space, but
we have lost the names, which
are useful if we are debugging code.
Note %3P%1 is not solely a value stack; it also contains the control information.
We are not always able to mix access and control information on one stack;
in fact, we know that a stack is not always a sufficent vehicle for
describing LISP's access requirements. However, a very large subset
of LISP %6does%1 allow a single-stack implementation, and we will be
compiling within that subset for most of this chapter.
Addressing the task at hand, the instructions for the body of %3j%1
will be very similar to those displayed for %3f[g[A];h[B]]%1.
We will generate instructions to save the values on the actual parameters by
prefixing the %3compexp%*-code with two %3PUSH%* operations:
.BEGIN TABIT1(34);SELECT 3;GROUP
\(PUSH P 1)
\(PUSH P 2)
.END
.P163:
After execution of this pair of instructions,
called the %2prolog%1, the value of %3y%* is on the
top of the stack, and the value of %3x%* is the next element down⊗↓The observant
reader will note that the %3PUSH%* for %3x%* is unnecessary.
Since we have assumed that %3x%* and %3y%* are strictly local, and since no one
else needs the value of %3x%* except for %3g[x]%*, we can simply compute
%3g[x]%* directly. One might also think that we could leave
%3B%* in %3AC2%* while we calculated %3g[x]%*; we cannot do that, as %3g[x]%*
might use %3AC2%*. We must %3PUSH%* %3y%*.←.
Now that we have saved the values, we need instructions to %3send%1 them to
%3AC1%* when the value is needed. We will implement %3send@%1 using
the %3MOVE%* instruction ({yonss(P33)}). In this case
our memory reference will be relative to the top of the stack %3P%*.
Relative addressing is described to our machine
with a memory field of the form "%3n#P%*", where %3n%* designates the offset into
%3P%* and references the %3n%8th%1 element, counting
backwards from zero. Thus in our
current example "%30#P%*" refers to the value for %3y%* and "%3-1#P%*" refers to
%3x%* at the time %3j%* is entered.
Be sure to realize also that our addressing is relative; though %3"0#P"%* refers
to %3y%* at entry time, %30#P%* will %6not%* refer to %3y%* when we have pushed
something onto the stack.
Be sure to realize that we %6cannot%* change our relative addressing to hard machine
locations in the assembler. The addressing must always be relative. We will be
compiling code for recursive functions. Each recursive call must get a fresh
segment of the value stack in which to store its results. A similar problem appeared
when we examined the %3CALL-RET%* mechanism on {yon(P167)}. There we were dealing
with control information stored on a stack.
Finally, we cannot leave the code for %3j%* as it stands. If the prolog pushes
two entries onto the stack then we had better construct an epilog to remove them;
otherwise the stack will not be
in the state expected by the calling program. As we leave %3j%* we subtract
%32%* from the pointer %3P%* to synchronize the stack.
The constant %32%1 is designated as %3(C#2)%1.
Finally we exit via %3RET%*.
One further embellishment is needed: since we are defining a function and
turning it into compiled code, we must preface the code sequence with information to
our assembler to designate %3j%* as a %3SUBR%*. The assembler will make a new
property-value pair consisting of the property name %3SUBR%1 and an associated
property value indicating the location in BPS which contains the beginning of
the code for the procedure. That pair is placed on the p-list of the
atom representing the function name.
.BEGIN TABIT2(10,45);SELECT 3;GROUP;
.P165:
\(LAP J SUBR)\%1; says %3j%* is a function%3
\(PUSH P 1)\%1; save the input args%3
\(PUSH P 2)
\(MOVE 1 -1 P)\%1; get %3x
\(CALL G)\%1; call the function named %3g
\(PUSH P 1)\%1; save the value%3
\(MOVE 1 -1 P)\%1; get %3y
\(PUSHJ P H)\%1; call %3h
\(MOVE 2 1)\%1; set up the arguments in%3
\(POP P 1)\%1; preparation for%3
\(CALL F)\%1; calling %3f
\(SUB P (C 2))\%1; synchronize the stack by removing
\\; the two saved args.%3
\(RET)\%1; exit with %3AC1%* containing the value of %3j[x;y].
.END
As you read the code and as you study its execution you should remember that
the addressing in the code is relative to the top of the stack: %3(MOVE#1#-1#P)%*
gets us %3x%* in one instance and gets us %3y%* in another, because the top
of the stack changes.
Here is a picture of the execution of the code:
.BEGIN TURN ON "\";NOFILL;TABS 5,10,25,30,45,50,65,70;SELECT 3;GROUP;
AC1: x ; AC2: y\\\AC1: x ; AC2: y\AC1: x ; AC2: y
\ \ (PUSH P 1)\\ (PUSH P 2)\| y\|(MOVE 1 -1 P)\| y | (CALL G)
\|\| =>\| x\| =>\| x\| =>\| x | =>
AC1: g[x] ; AC2: ?\\\AC1: y ; AC2: ?
\\ (PUSH P 1)\|g[x]\|(MOVE 1 -1 P)\|g[x]\| (PUSHJ P H)
\| y\| =>\| y\| =>\| y\| =>
\| x\|\| x\|\| x\|
AC1: h[y] ; AC2: ?\AC1: h[y] ; AC2: h[y]\AC1: g[x] ; AC2: h[y]
\|g[x]\| (MOVE 2 1)\|g[x]\| (POP P AC1)\| y |(CALL 2 (E F))
\| y\| =>\| y\| =>\| x | =>
\| x\|\| x\|
\ \
AC1: f[g[x];h[y]]
\| y\|(SUB P (C 2))\\\ \ (RET) %1return to caller.%*
\| x\| => \\=>\|\|
.END
.SS(A Compiler for Simple %3eval%*,compiler,P249:)
.BEGIN TURN ON "#";
Now that we know what the runtime code for local variable references %6could%* be,
the point now is to get a compiler
to generate code which `knows'
where on the stack we can find the value of a local variable when
we execute that code. What we shall do is simulate the behavior of
the runtime stack while we are compiling the code. The compiler
cannot know what the %6values%* of the variables will be at run time but
it can know %6where%* to find the values. We will carry
this information through the compiler in a manner reminiscent of the
`association#list' symbol table of the %3eval%* introduced in {yonss(P6)}.
Instead of
posting the current values in the stack, the compiler will post information about the
positions of the variables relative to the top of the stack at the
time we enter the function. The variable-position list, %3vpl%*,
contains this information. If we are
to compile a function with λ-variables, %3[x;y;z]%* then %3vpl%* will begin
with:
←%3((X . 1), (Y . 2), (Z . 3) ...%*
When we set up %3vpl%*
we also set the %2⊗→offset↔←%*, called %3off%*, to minus the number of
arguments added
to %3vpl%*, in this case -3. Now if we come across a reference, say to
%3Y%*, while compiling code, we use %3cdr[assoc[Y;vpl]]%* to retrieve %32%*. The
offset plus this retrieved value gives us the relative position of %3Y%*
in the stack:#-3#+#2#=#-1. Thus to refer to the
location of %3Y%* we use %3(...#-1#P)%*.
What happens as we add
elements to the stack? Or to be more precise, what happens as we
generate code which when executed will add elements to the stack?
Clearly we must modify the offset. If we add one element, we would
set %3off%* to -4. Then to reference %3Y%* now use -4 + 2 = -2; that is, a
reference to %3Y%* is now of the form:
←%3( ......-2 P).%*
But that's right. %3Y%* is now further down in the run time stack. Thus
the `symbol table' is really defined by %3off%*
plus the current %3vpl%*. Here's a sketch of the proposed %3compexp%*
in its performance of local variable recognition.
.BEGIN ;GROUP;SELECT 3;CENTERIT;
←islocalvar[exp] → list[mkvar[1;loc[exp;off;vpl]]]
%1where:%3
←loc <= λ[[x;off;vpl] plus[off;cdr[assoc[x;vpl]]]]
%1and,%3
←mkvar <= λ[[ac;mem] list[MOVE;ac;mem;P]]
.END
Next, when will the compiler make modifications to the top of the stack?
We push new elements when we are compiling the arguments to a function
call. We know that %3complis%* is the function which compiles the argument list.
Thus our new %3complis%* had better know about %3off%* and %3vpl%*, and
since %3complis%* changes the state of the stack, then it had better
change %3off%* appropriately:
.BEGIN SELECT 3; TABIT2(22,32);GROUP;CENTERIT;
complis <= λ[[u;off;vpl]\[null [u] → ( );
\ null[rest[u]] → compexp[first[u];off; vpl];
\ %Et%* → append\[compexp [first[u]; off; vpl];
\\ list[mkalloc[1]];
\\ complis [rest[u]; sub1[off]; vpl]]]]
.END
Notice that %3complis%* compiles the arguments from left to right,
following each with %3(PUSH#P#1)%* and recurring with a new offset
reflecting the effect of the %3PUSH%*. This function is analogous to
%3evlis%*.
.GROUP;
Here's a brief description of the parts of the new compiler⊗↓This compiler was
adapted from one written
by J.#McCarthy (⊗↑[McC#76]↑), and proved correct by R.#London#(⊗↑[Lon#71]↑)
and M.#Newey (⊗↑[New#75]↑).←.
.BEGIN INDENT 0,17;
%3compile[fn;vars;exp]: fn%* is the name of the function to be compiled. %3vars%* is the
list of lambda variables. %3exp%* is the lambda-body.
%3prup[vars;n]: vars%* is a lambda list, %3n%* is an integer. %3prup%* builds a
variable-position list.
.END
.APART
.BEGIN INDENT 0,19;GROUP;
%3compexp[exp;off;vpl]%*: This function generates the code for constants and
for references to variables.
If the variable is local, a simple %3send%1 is generated, otherwise
a call on %3lookup%1 results.
If a conditional expression is recognized, %3comcond%1 is called to
produce the code.
If %3exp%1 does not fit one of these categories, it is assumed to be
an application of a call-by-value function.
In this case, %3complis%* compiles the argument list,
leaving the arguments in the stack; %3loadac%* loads the appropriate %3AC%*'s.
and then we generate
a call on the function, and finally generate
the %3SUB%* to synchronize the stack.
.END
.BEGIN INDENT 0,20;GROUP;
%3comcond[u;glob;off;vpl]%*: this compiles the body of conditional expressions. %3u%* is the
p%4i%*#-#e%4i%* list; %3glob%* will be bound to a generated symbol name;
%3off%* and %3vpl%* will always be the offset and the variable-position list.
.END
.END
Fortified by the previous %3compile%* functions and this introduction
the new %3compile%* should be clear.
.BEGIN TABIT2(12,24);select 3;TURN OFF "←";GROUP
compile <= λ[\[fn;vars;exp]
\λ[[n] append[\mkprolog[fn;n];
\\compexp[exp; -n; prup[vars;1]];
\\mkepilog[n]]]
\ [length[vars]] ]
.END
.BEGIN TABIT1(18);select 3;TURN OFF "←";GROUP;
mkprolog <= λ[[f;n] concat[list[LAP;f;SUBR];mkpushs[n;1]]]
mkpushs <= λ[[n;m][\lessp[n;m] → ( );
\%Et%3 → concat[mkalloc[m]; mkpushs[n;add1[m]]]]]
mkepilog <= λ[[n] list[mksync[n];mkret[]]]
mksync <=λ[[n] list[SUB;P;list[C;n]]]
mkret <=λ[[] (RET)]
.END
.BEGIN TABIT1(17);select 3;TURN OFF "←";GROUP;
prup <= λ[[vars;n][\null[vars] → ( );
\%et%3 → concat[cons[first[vars]; n];prup[rest[vars];add1[n]]]]]
.END
.BEGIN TABIT3(23,39,53);select 3;TURN OFF "←";GROUP;
compexp <= λ[[exp;off;vpl]\[isconst[exp] → list[mkconst[1;exp]];
\ islocalvar[exp] → list [mkvar[1;loc[exp;off;vpl]]];
\ isnonlocal[exp] → list[mklookup[exp]];
\ iscond[exp] → comcond[args%4c%3[exp];gensym[ ];off; vpl];
\ isfun+args[exp] →\λ[[z] compapply[\func[exp];
\\\complis[z;off;vpl];
\\\length[z]]
\\ [arglist[exp]] ]]
.END
%3compapply%1 is found on {yon(P82)}.
.BEGIN TABIT2(25,35);select 3;TURN OFF "←";GROUP;
comcond <= λ[[u;glob;off;vpl][\null[u] → list[mkerror[ ];glob];
\%Et%3 → append[\comclause[first[u]; gensym[];glob;off;vpl];
\\comcond[rest[u]; glob;off;vpl] ]]
.END
.BEGIN TABIT1(34);select 3;TURN OFF "←";GROUP;
comclause <=λ[[p;loc;glob;off;vpl]append[\compexp[ante[p];off;vpl];
\list[mkjumpf[loc]];
\compexp[conseq[p];off;vpl];
\list[mkjump[glob];loc]]]
.END
Here is a partial sketch of %3compile%* operating on %3j <= λ[[x;y]f[g[x];h[y]]]%*.
Compare the code it generates with the code we saw on {yon(P165)}.
.BEGIN SELECT 3;TABIT2(10,17);CENTERIT;
.GROUP
compile[J;(X Y);(F (G X) (H Y))]
%1gives:%*
\append\[((LAP J SUBR));
\\ (PUSH P 1)
\\ (PUSH P 2)
\\ compexp[(F (G X) (H Y));-2;prup[(X Y);1]];
\\ ((SUB P (C 2))
\\ (RET))] %1
.APART
.GROUP
where:
←%3prup[(X Y);1]%* gives %3((X . 1) (Y . 2))%*.
.APART
.GROUP
%3compexp[(F (G X) (H Y));-2;((X . 1) (Y . 2))]%*
results in:
%3
\append\[complis[((G X) (H Y));-2;((X . 1) (Y . 2))];
\\ mklink[2];
\\ ((CALL F))]
%1and %3mklink[2]%* evaluates to: %3((MOVE 2 1) (POP P 1))%*.
.APART
.GROUP
Thus the code we're getting looks like:
%3
\\((LAP J SUBR)
\\ (PUSH P 1)
\\ (PUSH P 2)
\\ complis[((G X) (H Y));#-2;#((X . 1) (Y . 2))]
\\ (MOVE 2 1)
\\ (POP P 1)
\\ (CALL F)
\\ (SUB P (C 2))
\\ (RET) )
.APART
.FILL;
%3complis%1 is interesting since it actually uses the %3vpl%* we have been
carrying along. It gives rise to:
.NOFILL
.GROUP
%3
\append\[compexp[(G X);-2;((X . 1) (Y . 2))];
\\ ((PUSH P 1));
\\ complis[((H Y));-3;((X . 1) (Y . 2))]]
.APART
.GROUP
%1and the %3compexp%* computation involves, in part:
%3
\append[complis[(X);-2;((X . 1) (Y . 2))];
\\ ((CALL G))]
.APART
.GROUP
%1Finally this %3complis%* generates the long awaited variable reference using:
%3compexp[X;-2;((X . 1) (Y . 2))] %1giving,##%3((MOVE 1 -1 P))%*.
.APART
.GROUP;
So our code is:
%3
\\((LAP J SUBR)
\\ (PUSH P 1)
\\ (PUSH P 2)
\\ (MOVE 1 -1 P)
\\ (CALL G)
\\ (PUSH P 1)
\\ complis[((H Y));#-3;#((X . 1) (Y . 2))]
\\ (MOVE 2 1)
\\ (POP P 1)
\\ (CALL F)
\\ (SUB P (C 2))
\\ (RET) )%1
Notice that the offset is different within the call:
←%3 complis[((H Y));-3;((X . 1) (Y . 2))].%1
But that is as it should be: there is an extra value on the stack now.
.END
.APART
.CENT(Problems)
%21.%1 Complete the code generation for the above example.
.P36:
%22.%* Extend the compiling algorithm to recognize anonymous λ-expressions.
.SS(Efficient Compilation)
.P35:
We have discussed compilation at two different levels: we can
translate LISP expressions into sequences of the LISP control primitives
of {yonss(P230)}; or we can translate into the instructions of the
%2SM%1 machine of {yonss(P33)}. We conceptualized the compilers in terms of
higher level, but biased many of our choices towards
implementations in terms of the %2SM%1 instruction
set. Our choices influenced the efficiency of the resulting compiler.
We should first clarify what we mean by efficiency in this context.
If the compiler produces code for the LISP primitives and then we encode the LISP
primitives in terms of the %2SM%1 instruction set, then we get a simple compiler
which tends to produce inefficient code; inefficent, in terms of the %2SM%1 machine,
not in terms of the LISP primitives. Such a compiler would be efficient
in terms of compilation time and
might suffice for debugging runs or student projects.
More likely, efficient compilation is taken to mean production of
code which we could expect from a reasonably bright machine-language programmer.
It should run reasonably fast, not have obviously redundant instructions, and
not take too much space in the machine. It is this second interpretation
of efficiency which we shall use. In this interpretation we
don't simply implement the LISP primitives, but take a more global view
of the underlying machine. We take advantage of more of the hardware
features, incorporating them deeper into the structure of the compiler.
This process is called optimization.
Optimization defies the mismatch between the programming language
and the hardware machine.
The result is a compiler which is much more machine dependent,
requires more processing time, but produces much better code for
that specific machine.
Examination of the compilation of even the
most simple function suggests many possible improvements,
given the %2SM%1 machine.
A major inefficiency occurs in saving and restoring quantities on the
stack. This is a symptom of a more serious disease:
the compiler does not remember what will be in the ACs at run-time. Since
we are assuming that the
arguments to a function call are to be passed through the %3AC%1's,
and since it is expensive to save and restore these registers, we should
make a concerted effort to remember what quantities are in which ACs and
not reload them unnecessarily. But note that this optimization is dependent
on the hardware of our machine; if we had only one %3AC%1, the trick would
not be applicable.
.SS(Efficiency: Primitive Operations,,P166:)
First we should be able to generate references to constants into %3AC%*'s
other that %3AC1%*.
.GROUP;
For example, the call on %3f[1;A]%* should be generated
as:
.BEGIN GROUP;TABIT1(33);SELECT 3;
\(MOVEI 1 1)
\(MOVEI 2 (QUOTE A))
\(CALL F)
.END
There is no reason to save constants in the stack.
.APART
We should also expect that the LISP primitive operations, %3car, cdr, cons, eq,%*
and %3atom%* should occur rather frequently in compiled code; and we
should expect that a reasonable compiler be cognizant of
their existence and compile more efficient code for their execution.
In this section we will enlarge the instruction set of our machine, adding
plausible operations for some of these primitives⊗↓Some of these instuctions
exist on the PDP-10. HLRZ and HRRZ are used for %3car%1 and %3cdr%1,
respectively; and the version of the PDP-6
which was delivered to Stanford had a hardware %3cons%1 operation.←.
In the description of these new instructions
%3ac%* will always refer to an %3AC%*-register; %3loc%* will be either
an %3AC%* or a memory location, and %3mem%* will be reserved for memory
references only.
%3CAR%* is an instruction, taking two arguments:
an %3ac%* and a %3loc%*
respectively. The %3car%* operation is performed from %3loc%* to %3ac%1.
For example when compiling the call, %3f[1;car[x]]%*,
we want to get %3car[x]%* in %3AC2%*. If %3x%* were in -%35#P%* then we could
accomplish our loading directly by
%3(CAR 2 -5 P)%1 instead of:
.BEGIN TABIT1(33);SELECT 3;GROUP;
\(MOVE 1 -5 P)
\(CALL CAR)
\(MOVE 2 1)
.END
We can also exploit the fact that the second argument to %3CAR%* is a %3loc%*:
the second argument to %3f[1;car[car[x]]]%* could have been compiled as:
.BEGIN TABIT1(33);GROUP;SELECT 3;
\(CAR 2 -5 P)
\(CAR 2 2)
.END
We will assume the existence of an analogous %3CDR%* instruction. With
these two instructions we can significantly improve the code for
%3car-cdr%*-chains.
Another source of efficiency is available to us. Consider the clause:
.BEGIN CENTER;SELECT 3;
[eq[x;A] → B; ...]
.END
.BEGIN GROUP;
Assuming that %3x%* were on the top of the stack, our current compiler
would generate:
.TABIT1(33);SELECT 3;
\ (MOVE 1 0 P)
\ (MOVEI 2 (QUOTE A))
\ (CALL EQ)
\ (JUMPF 1 L1)
\ (MOVEI 1 (QUOTE B))
\ (JUMP LOUT)
\L1 ...
.END
The use of predicates in this context does not require
construction of the constants %et%* and %ef%*. All we need to do is implement
the %3eq%* test as a jump to one of two locations.
We will introduce an instruction %3CAME%* taking two arguments;
first, an %3ac%* and the second, a %3loc%*. %3CAME%* compares
the contents of the two arguments, and if they are equal, it skips
the next instruction.
.BEGIN TABIT1(33);GROUP;
Thus the above example could be compiled as:
.SELECT 3;
\ (MOVEI 1 (QUOTE A))
\ (CAME 1 0 P)
\ (JUMP L1)
\ (MOVEI 1 (QUOTE B))
\ (JUMP LOUT)
\L1 ...
.END
Notice that we have added an extra piece of knowledge to the compiler;
it knows that %3eq%* is commutative in this instance⊗↓If there are
side-effects in the computation of the arguments, the
order can make a difference. However unless explicitly stated our compilers do
not have to consider side-effects.←.
We still require some artifacts in the compiler to generate full procedure
calls on
predicates particularly since predicates may return values
other than %et%1 and %ef%1. But in many instances,
particularly within %3compcond%*, we can expect to generate tighter code.
.SS(Efficiency: Calling Sequences)
We want to integrate the new compiling techniques into our
compiler. Since LISP depends heavily on procedure calls, the
computation of parameter lists and procedure calls is an area
of great concern to the designer of a LISP compiler.
Here is the code which the current compiler will produce for the expression
%3f[1;g[3];#car[x]]%*:
.BEGIN tabit1(33);SELECT 3;GROUP
\(MOVEI 1 1)
\(PUSH P 1)
\(MOVEI 1 3)
\(CALL G)
\(PUSH P 1)
\(MOVE 1 -2 P)
\(CALL CAR)
\(MOVE 3 1)
\(POP P 2)
\(POP P 1)
\(CALL F)
.END
By way of motivation and introduction, here is what our next compiler does for the
same call:
.BEGIN TABIT1(33);SELECT 3;GROUP;
\(MOVEI 1 3)
\(CALL G)
\(MOVE 2 1)
\(CAR 3 0 P)
\(MOVEI 1 1)
\(CALL F)
.END
Examination of the code shows the results of several optimization techniques.
We are using the %3CAR%* instruction of the last section.
We are also doing operations into %3AC%*'s other than %3AC1%*. This
allows us to remove some of the obnoxious %3PUSH-POP%* sequences.
The major modification involves an analysis of the arguments being compiled for
a function call.
Much of LISP's activity involves function calls. Much of the current
compiler's inefficiency involves generation of arguments to those calls.
This is a bad combination.
Thus we should concentrate some effort on this area of the compiler.
That part is %3complis%*.
Within our new %3complis%* we will
divide the arguments into two classes:#trivial and complex.
Since most of our worry is about the optimization of the %3AC%1's,
we will make %3complis%1 the major state of the compiler. We can define
%3compexp%1 as:
.BEGIN SELECT 3;CENTER
compexp <= λ[[exp;vpl;off] complis[list[exp];vpl;off]]
.END
%3complis%1 is the natural place to deal with register allocation since
it is responsible for the compilation of the actual parameters. The alternative
would be to pass the %3AC%1 destination to %3compexp%1. That scheme
becomes quite complex if dealt with consistently. So %3complis%1 becomes
the kernel function and must examine each argument to a function call.
Trivial arguments are those which need make no demands on the runtime stack;
the computation they entail can all be done in the %3AC%* registers. Thus
the code that the compiler generates need not involve %3PUSH-POP%* sequences.
For example, references to constants need not be generated and then pushed
onto the stack; we can compile the other arguments first and then, just before
we call the function, load the appropriate %3AC%* with that
constant. A similar argument can be used for postponing the loading of
variable references⊗↓But note that the argument for variables is shaky;
if our compiler handled programs with side-effects then we could %6not%*
be sure that the postponed value would be the same as that gotten if
we had loaded it at the "proper" time.←. The third trivial construct for this
%3complis%* is the handling of %3car-cdr%* chains. We will use our augmented
instruction set to perform computation of %3car%*s and %3cdr%*s directly
to a specified %3AC%*.
Complex arguments are those which require some non-trivial computation;
each non-trivial
computation will be prefaced with a %3PUSH%1 to save the current contents of
%3AC1%1.
Besides the compilation of efficient
code we would also like to make the compiler efficient. We would like to make the
compiling process as one-pass as possible. Our basic tasks in the new %3complis%*
are classification of the arguments and compilation of the code. With a little
care we can do both at the same time. There is nothing problematic about
the compilation of the trivial code⊗↓That's why it's trivial!←. We thus
turn to the complex code.
The old %3complis%* generated a block <code %3arg%4i%1>-%3PUSH%1 on each
cycle. That code was followed by a %3MOVE%1 to move the last value
from %3AC1%1 to %3ACn%1. In the previous compiler %3compexp%1 was the major
function; it handled the bulk of the code generation. Here %3complis%1 will be the
major function. The old %3complis%1 had three states: empty argument list,
singleton argument list, and otherwise condition. The new %3complis%1 has two states;
this is done to make %3complis%1 shorter.
On each cycle through %3complis%1 we generate a
%3PUSH%*-<code#%3arg%4i%1> sequence. Now we have a spurious
%3PUSH%* on the %6front%* of the sequence; one %3rest%* will take care of %6that%*.
We must also generate a list of %3POP%*s to suffix to the complex code
to get the saved values back into the proper %3AC%*'s: one pop for each argument.
The %6last%* %3POP%* should be modified to be a %3MOVE%1 since
we have not generated the corresponding %3PUSH%*. The memory
field of the last %3POP%* has the needed information; it tells us where
the %3MOVE%* we want to make should go:
.BEGIN CENTER;SELECT 3;
(POP P N) => (MOVE N 1)
.END
.GROUP
This modified list of %3POP%*s is added to the code sequence, followed by any
trivial code which we may have
generated. Note that this reordering is strictly an efficiency consideration
under the assumption that the %3AC%1's are being used to
simulate a temporary %3dest%1 block, which will
immediately become a block of local bindings, and which are subject to
local use only.
With this introduction, here is %3complis%* and
friends:
%3complis <= λ[[u;off;vpl] complis%9'%*[u;off;off;vpl;();();();1]%1
.APART
.BEGIN turn on "\"; no fill;TABs 12,20,30;SELECT 3;TURN OFF "←";
.GROUP
complis%9'%* <= λ[[u;org;off;vpl;triv;cmplx;pop;ac]
\[null[u] →\[null[cmplx] → triv;
\\ %et%* → append\[rest[cmplx]
\\\ list[mkmove[mem[first[pop]];1]];
\\\ rest[pop];
\\\ triv]];
.END
.BEGIN turn on "\"; no fill;TABs 12,23,35,63;SELECT 3;TURN OFF "←";
.GROUP
\ isconst[first[u]] → complis%9'%*[\rest[u];
\\\org;
\\\off;
\\\vpl;
\\\concat[mkconst[ac;first[u]];triv];
\\\cmplx;
\\\pop;
\\\add1[ac]];
.END
.BEGIN turn on "\"; no fill;TABs 12,23,33,63;SELECT 3;TURN OFF "←";
.GROUP
\ isvar[first[u]] → complis%9'%*[\rest[u];
\\\org;
\\\off;
\\\vpl;
\\\concat[mkvar[ac;loc[first[u];off;vpl]];triv];
\\\cmplx;
\\\pop;
\\\add1[ac]];
.END
.BEGIN turn on "\"; no fill;TABs 12,23,36,43;SELECT 3;TURN OFF "←";
.GROUP
\ iscarcdr[first[u]] → complis%9'%*[\rest[u];
\\\org;
\\\off;
\\\vpl;
\\\append[\reverse[compcarcdr[ac;first[u];off;vpl]];
\\\\triv];
\\\cmplx;
\\\pop;
\\\add1[ac]];
.END
.BEGIN turn on "\"; no fill;TABs 12,23,34,41,47,54;SELECT 3;TURN OFF "←";
.GROUP
\ iscond[first[u] → complis%9'%*[\rest[u];
\\\org;
\\\sub1[off];
\\\vpl;
\\\triv;
\\\append[\cmplx;
\\\\concat[\mkpush[1];
\\\\\comcond[\args%4c%3[first[u]];
\\\\\\gensym[];
\\\\\\off;
\\\\\\vpl]]];
\\\concat[mkpop[ac];pop];
\\\add1[ac]];
.END
.BEGIN turn on "\"; no fill;TABs 12,23,30,36,50,57;SELECT 3;TURN OFF "←";
.GROUP
\ %et%* → complis%9'%*[\rest[u];
\\org;
\\sub1[off];
\\vpl;
\\triv;
\\append[\cmplx;
\\\concat[\mkpush[1];
\\\\λ[[z] compapply[\func[first[u]];
\\\\\complis[\z;
\\\\\\off;
\\\\\\vpl];
\\\\\length[z]]]
\\\\ [arglist[first[u]] ]];
\\concat[mkpop[ac];pop];
\\add1[ac]]]
.END
.BEGIN CENTERIT;SELECT 3;
mkmove <= λ[[ac;loc][eq[ac;loc] → (); %et%* → list[MOVE;ac;loc]]]
.END
.BEGIN TABS 6,15,24,42;NOFILL;SELECT 3;TURN OFF "←";GROUP;TURN ON"\";
compcarcdr <= λ[[ac;exp;off;vpl]
\\[isvar[arg[exp]] → list[mkcarcdr[\func[exp];
\\\\ac;
\\\\loc[arg[exp];off;vpl]]]
\\%Et%* → concat[\mkcarcdr_ac[func[exp];ac;ac];
\\\compcarcdr[ac;second[exp];off;vpl]]]]
.APART
.GROUP
iscarcdr <=λ[[u]\[iscar[u] →iscarcdr[arg[u]]
\\ iscdr[u] →iscarcdr[arg[u]]
\\ atom[u] → %et%*
\\ %et%* → %ef%* ]]
iscar <= λ[[x] eq[func[x];CAR]]
iscdr <= λ[[x] eq[func[x];CDR]]
mkcarcdr <=λ[[carcdr;ac;loc] concat[carcdr;concat[ac;loc]]]
.END
.SS(Efficiency: Predicates)
We have already noted in {yonss(P166)} that some efficiencies are possible in
the handling of predicates inside of conditional expressions. Here we will examine
more possibilities. The first point of contention is that the current
%3compclause%* is %6not%* good enough. We want to be able to use the Boolean
special forms: %3and[u%41%*;#...;u%4n%*]%1 and
%3or[u%41%*;#...;u%4n%*]%1. The definition of these constructs required
they not evaluate any more arguments than necessary.
We can use this property when using %3and%1 and %3or%1 as predicates in
conditional expressions.
We will add recognizers for %3and%* and %3or%* inside
%3compclause%* and will add a new section to the compiler to deal with
their compilation.
First, here is the structure of typical code sequences:
.BEGIN TABIT2(10,40);SELECT 3;GROUP;
\and[u%41%*; ... u%4n%*] → e;\or[u%41%*; ... u%4n%*] → e;
\ %1gives: \gives:%3
\ <code for u%41%*>\ <code for u%41%*>
\ (JUMPF 1 lint)\ (JUMPT 1 loc)
\ <code for u%42%*>\ <code for u%42%*>
\ (JUMPF 1 lint)\ (JUMPT 1 loc)
\ . . .\ . . .
\ <code for u%4n%*>\ <code for u%4n%*>
\ (JUMPF 1 lint)\ (JUMPT 1 loc)
\ (JUMP loc) \ (JUMP lint)
\loc\loc
\ %1<code for %3e%*>\ <code for %3e%*>%3
\ (JUMP lout)\ (JUMP lout)
\lint\lint
.END
The label %3lint%1 indicates the next clause in the conditional
expression. Note the symmetry between the code for %3and%1 and the
code for %3or%1. There is a slight inefficiency in %3and%1 with
%3(JUMP#loc)%1 immedately followed by %3loc%1, but we can easily remove that.
.BEGIN GROUP;TABIT1(36);
Here is a %3compclause%* which will generate it:
.select 3;
compclause <=λ[[p;loc;glob;off;vpl] append[\compred[ante[p];loc;off;vpl];
\compexp[conseq[p];off;vpl];
\list[mkjump[glob];loc]]]
.END
.BEGIN TABIT3(25,44,48);GROUP;SELECT 3;
compred <= λ[[p;lint;off;vpl]\[isand[p] → compandor[\args[p];
\\off;
\\vpl;
\\list[mkjumpnil[lint]];
\\()];
\ isor[p] → λ[[loc] compandor[\args[p];
\\\off;
\\\vpl;
\\\list[mkjumpt[loc]];
\\\list[mkjmp[lint];loc]]][gensym[]];
\ %et%* → append[compexp[p;off;vpl];list[mkjumpf[lint]]] ]]
.END
.BEGIN TABIT2(30,41);GROUP;SELECT 3;
compandor <=λ[[u;off;vpl;inst;fini]\[null[u] → fini;
\ %et%* → append[\compexp[first[u];off;vpl];
\\inst;
\\compandor[rest[u];off;vpl;inst;fini]] ]]]
.END
.CENT(Problems)
%2I%* We should recognize the construct %et%*#→%3e%4i%1 in conditional expressions
and compile special code for it. We should also realize that in the construct:
.BEGIN CENTER;SELECT 3;
[p%41%* → e%41%* ... %et%* → e%4i%*; ...p%4n%* → e%4n%*]
.END
we can %6never%* reach any part of the conditional after the %et%*-predicate.
Therefore no code should be generated. Rewrite the compiler to handle these
additional observations about conditionals.
The second point, above, is a special instance of a general compiling question.
How clever should the compiler be? If it can recognize that a piece of program
can never be reached, should it tell the user or should it compile minimal code?
%2II%* Write a new %3compile%* including all the efficiency considerations
discussed so far.
%2III%1 When we apply the convention that anything non-%3NIL%1 is a representation
of truth, it is often convenient to evaluate %3and%1 and %3or%1 for "value".
That is their value is either %3NIL%1 or non-%3NIL%1. Extend our compiler
to handle such uses of these functions.
%2IV%1 Extend the compiler to compile efficient code for compositions
of the predicates %3and%1, %3or%1, and %3not%1.
.SS(A Compiler for %3progs%*)
The compiler of this section will not compile all %3prog%*s; it is only
intended to demonstrate some of the salient features of a %3prog%* compiler.
They are:
.BEGIN INDENT 0,5,5;
.GROUP
%21.%* Handling of assignments. Since we are assuming local variables,
then storage to the value stack is sufficient.
.APART
.GROUP
%22.%* The %3go%*-label pair. We will assume that this can be passed off to the
assembler.
.apart
.group
%23.%* On leaving a %3prog%*-body we have to remove the %3prog%*-variables
from the top of the stack. This is done by comparing the current %3off%*
with %3vpl%*.
.END
.BEGIN SELECT 3;GROUP;nofill;TABIT3(15,26,34);
compprog <=λ[[locals;body;off;vpl]
\λ[[n]append[\mkpushlistnil[n];
\\compbody[\body;
\\\labels[body];
\\\difference[off;n];
\\\pruploc[locals;-off;vpl];
\\\n;
\\\gensym[]]]
\ [length[locals]]
.END
.BEGIN SELECT 3;GROUP;nofill;TABIT2(24,35);
pruploc <= λ[[locals;off;vpl]\[null[locals] → vpl;
\ %et%* → pruploc[\rest[locals];
\\add1[off];
\\concat[cons[first[locals];off];vpl]]]]
.END
.BEGIN SELECT 3;GROUP;TABIT1(18);
labels <= λ[[body]\[null[body] → ();
\ islabel[first[body]] → concat[first[body];labels[rest[body]]];
\ %et%3 → labels[rest[body]]]]
.END
.BEGIN SELECT 3;GROUP;nofill;TABs 15,26,39,48,51,55; turn on "\";
compbody <= λ[[body;labels;off;vpl;n;exit]
\[null[body] →list[mkconst[1;NIL];exit;mksync[n]];
\ islabel[first[body]] → concat[first[body];compbody[\rest[body];
\\\\\\labels;
\\\\\\off;
\\\\\\vpl;
\\\\\\n;
\\\\\\exit]];
\ isgo[first[body]] → append[\list[compgo[arg[first[body]];labels]];
\\\compbody[\rest[body];
\\\\labels;
\\\\off;
\\\\vpl;
\\\\n;
\\\\exit]];
\ isret[first[body]] → append[\compexp[arglist[first[body]];off;vpl];
\\\sync[off;vpl];
\\\list[mkjump[exit]];
\\\compbody[\rest[body];
\\\\labels;
\\\\off;
\\\\vpl;
\\\\n;
\\\\exit]];
\ issetq[first[body]] → append[\compexp[rhs[first[body]];off;vpl];
\\\list[mkmovem[\1;
\\\\\loc[lhs[first[body]];off;vpl]]];
\\\compbody[\rest[body];
\\\\labels;
\\\\off;
\\\\vpl;
\\\\n;
\\\\exit]];
\ iscond[first[body]] → append[\compcondprog[arg[first[body]]];
\\\compbody[\rest[body];
\\\\labels;
\\\\off;
\\\\vpl;
\\\\n;
\\\\exit]];
\%et%* → append[\compexp[first[body];off;vpl];
\\compbody[rest[body];off;vpl;n;exit]]]]
.END
.BEGIN SELECT 3;GROUP;
compgo <= λ[[x;l][member[x;l] → mkjump[x]; %et%3 → err[UNDEFINED_TAG]]];
.END
This %3compprog%1 only handles a subset of the semantics of %3prog%1.
We do not handle any non-local jumps; a new list of labels is made up on
entry to a %3prog%1 and only that set of labels is accessible for %3go%1s.
As a further restriction,
we also assume that the %3prog%1 variables are used in a strictly local
fashion.
.CENT(Problem)
Write %3compcondprog%*.
.SS(Further Optimizations,,P258:)
This section is in the nature of hints and possibilities for expansion of the
basic compiling algorithms.
One of the first things to note about the compiling algorithm is its lack of
knowledge about what
it has in the various %3AC%*'s. Frequently the
compiled code will load up one of the registers with a quantity that is already
there. Thus the first suggestion: build a list of
what's in various registers. We know what's there when we enter the function;
whenever we perform an operation which destroys a register then we have to
update the compiler's memory. Whenever we need a quantity, we check the
memory. If the object is already in the %3AC%*'s then we use it. Clearly there is a
point at which the complexity of the object stored is too complicated to be worth
remembering. However, the idea can be used quite profitably for variable references
and simple computations. This idea is a simple form of
%2common sub expression elimination%*. For example,
assuming the the compiler knows that %3x%* is in %3AC1%*, here's code for:
.BEGIN SELECT 3;CENTER
f[car[x];cdr[car[x]]].
.END
.BEGIN TABIT1(33);SELECT 3;GROUP;
\(CAR 1 1)
\(CDR 2 1)
\(CALL F)
.END
This idea can be extended. There is nothing sacred about knowing only the
contents of the special registers. We could keep a history of the partial computations
in the stack. Then if we need a partial result we might find it already computed
in the %3AC%*s or stored on the stack.
We might also keep track of whether stack or %3AC%1 contents are still needed.
For example, in our compiled function %3j%1 on {yon(P165)} we might have
noticed that after the call on %3g%1, the value of %3x%1 was no longer needed;
therefore we need not save %3x%1. Similarly we don't need the value of %3y%1
after the call on %3h%1. If we build this kind of information
into a compiler, we can generate more efficient code. However,
many of these ideas must be used with some care.
Side-effects can destroy the validity of partial results.
Notice that we are comparing the %6symbolic%1 values in the %3AC%*'s or stack;
we cannot look for actual values. This idea of symbolic processing can be exploited
at a much more sophisticated level in LISP compilers.
In particular, we can perform program transformations.
For example, the compiler can rewrite program segments taking advantage of
transformations it knows. These transformations typically involve equivalence
preserving operations which might lead to more efficient compiled code.
For example several LISP compilers have the ability to perform recursion removal,
replacing recursive programs with equivalent iterative versions⊗↓All these
transformations are typically invisible to the user.←.
Here's a case:
.BEGIN SELECT 3;CENTERIT;
.p172:
←rev <= λ[[x;y][null[x] → y; %et%* → rev[rest[x];concat[first[x];y]]]]
.END
This program is automatically rewritten as:
.BEGIN SELECT 3;GROUP;TABIT2(20,23);turn off "←";
rev <= λ[[x;y] prog[[]\l\[null[x] → return[y]];
\\y ← concat[first[x];y];
\\x ← rest[x];
\\go[l] ]]
.END
This second version makes no demands on the run-time stack to save its
partial computation like the recursive version would. Typically
this second version will execute faster.
A major obstacle to most kinds of optimization is the unrestricted use
of labels and %3go%1s. Consider a piece of compiler code which has a label attached
to it.
Before we can be assured of the integrity of an %3AC%1
we must ascertain that every possible path to that label maintains that %3AC%1.
This is a very difficult task. The label and goto structure required by
%3compile%* is quite simple. However if we wished to build an optimizing
compiler for
LISP with %3prog%*s we would have to confront this problem.
.CENT(Problems)
%21.%* Extend the compiling algorithm to remember what it has in its %3AC%*
registers. How much of the scheme is dependent on lack of side-effects?
%22.%* Titled: " If we only had an instruction... "
We advocate an instruction, %3EXCH ac loc%*, which will exchange
the contents of the %3ac%1 and the %3loc%1. This instruction could be used
effectively on the code for %3j%* on {yon(P163)} to save a %3PUSH-POP%*
pair.
.BEGIN GROUP
Here's %3EXCH%* in action, using the results of the previous exercise:
.TABIT2(10,45);SELECT 3;
\((LAP J SUBR)\%1; says %3j%* is a function%3
\ (PUSH P 2)
\ (CALL G)\%1; call the function named %3g
\ (EXCH 1 0 P)\%1; save the value and dredge up %3y
\ (CALL H)\%1; call %3h
\ (MOVE 2 1)
\ (POP P 1)\%1; preparation for%3
\ (CALL F)\%1; calling %3f
\ (RET))\%1; exit.%3 AC1%* still has the value from %3f.
.END
Look for general situations where %3EXCH%* can be used. In your tour
through the compilers, try to imagine other useful instructions.
%23.%* Write code for the factorial function, and simulate
the execution on %32!%*.
%24.%* Write a LISP function to take recursive schemes into equivalent iterative
ones in the style of the %3rev%* example on {yon(P172)}. Your first
version need not be as efficient as the one advertized there, but try to make it
better as you proceed. See ⊗↑[Dar#73]↑ for this and related transformations.
.SS(Functional Arguments,,P3:)
Function variables add more complication to the compiling algorithms.
We will address the simpler cases of functional arguments.
There are two issuses involved: what to compile when a %3function%1 construct
is recognized; and what to do when the function position of an application
is a variable.
.GROUP;
Consider an example:
.BEGIN CENTER;SELECT 3;
%3foo[ ...;function[%7f%3];...]%1
.END
%1We generate %3(MOVEI 1 %7f%3)%1 and compile %7f%1 if it is a
λ-definition; otherwise we essentially generate %3(MOVEI 1 (QUOTE %7f%3))%1.
Assume %3foo%1 is defined as:
.BEGIN CENTER;SELECT 3;
foo <= λ[[ ...;g; ...] ....g[t%41%3; ...;t%4n%3]]
.END
.APART
.GROUP;
The instance of %3g%1 in
%3g[t%41%3; ...;t%4n%3]]%1
is a special case of a computed function ({yon(P227)}); in this case, the computation is
only a variable lookup. We will display the more general code for
a computed function call of the form:
.BEGIN CENTER;SELECT 3;
exp[t%41%3; ...;t%4n%3].
.END
We get:
.BEGIN SELECT 3;GROUP;TABIT2(25,32);
\append[\<compexp[exp;off;vpl];
\\list[mkalloc[1]];
\\<complis[(t%41%3; ...;t%4n%3);off-1;vpl]>
\\list[mkalloc[1]];
\\((CALLF n 0 P))
\\((SUB P (C 1))]
.END
.APART
The calling structure for a functional argument
is slightly different. The arguments are on the stack but,
more importantly, note that the call must always be
trapped and decoded. We cannot replace that call with
a %3PUSHJ%1 to some machine language code for the function because the
function referred to can change. We use a %3CALLF%1 instruction
to designate a call on a functional argument.
If the funtion is a variable name we cannot know at compile-time, what to
%3PUSHJ%* to. Since the value of the expression may very well change
during execution we can %6never%* replace the %3CALL%* with a %3PUSHJ%*.
Often, unneeded
generality allowed by functional arguments can be compiled
out.
Production LISP compilers, like the MacLISP compiler, are able to compile
efficient code for many instances of the LISP mapping functions, like
%3maplist%1; in the general case the code must expect arbitrary functional
arguments.
The problems of compiling efficient code become magnified if
generalized control structures are anticipated. The problem is similar
to that of recognizing an implied loop construct in a program
using labels and %3go%1's to control the algorithm. Control constructs
like %3catch%1 and %3throw%1 ({yon(P195)}) have some advantages
here; rather than using evaluation relative to
arbitrary access and control environments (⊗↑[Bob#73a]↑),
these constructs do impose some regularity which the compiler and
the programmer can exploit.
.SS(Macros and Special Forms,macros,P57:)
We now wish to extend our compiler to handle macro definitions.
Consider the example of defining %3plus%* of an indefinite number of arguments
given on {yon(P58)}.
In the presence of a compiler we can frequently make execution of macros
more efficient than their special form counterpart.
The distinction is that macros only involve transformations
which can be executed at compile time, whereas a special form
may involve run time information.
For example, consider the case of the macro definiton of %3plus%1.
When %3plus%* is called we know the number of arguments, and can simply
expand the macro to a nest of calls on %3*plus%*. For example:
.begin centerit;
%3
←plus[4;add1[2];4] %1expands to%* *plus[4;*plus[add1[2];4]] %1
.end
which can be efficiently compiled.
Macros can also be used effectively in implementing abstract data structures and
control structures. For example,
the constructors, selectors, and recognizers which help characterize
a data structure can be expressed as very simple S-expr operations.
These operations are performed quite frequently in a data structure algorithm
and so any increase in their running efficiency will be beneficial.
Recall that on {yon(P60)} we defined %3coef%*
as %3car%*. Compiled calls on %3coef%* would invoke the function-calling
mechanism, whereas many compilers can substitute actual hardware instructions
for calls on %3car%*, resulting in more efficient run-time code.
It would be better to write %3car%* instead of
%3coef%*. There are two objections to this. First, %3coef%* has more
mnemonic significance than %3car%*. Second, using %3car%* we have explicitly
tied our algorithm to the representation. Both are strong
objections.
Macros can help overcome both objections. Define:
←%3coef <%4m%*= λ[[l] cons[CAR;cdr[l]]]%1.
The user writes %3(COEF ...)%*; the evaluator sees %3(COEF ...)%* and
evaluates %3(CAR ...)%*; the compiler sees %3(COEF ...)%*
and compiles code for %3(CAR ...)%*. With macros, we
can get the efficient code, the
readibility, and flexibility of representation.
Macros can also be used to perform most of the operations which special forms
are meant to do.
Since %3eval%* handles calls on special forms, we should examine the
extensions to %3compile%* to generate such code. We have seen that
in compiling arguments to (normal) functions, we generate the code for
each, followed by code to save the result in the run-time stack, %3P%*.
The argument to a special form is %6unevaluated%*, by definition. All we
can thus do for a call of the form %3f[l]%*, where %3f%* is a special form,
is pass the argument, compiling something like:
.BEGIN CENTERIT;SELECT 3;GROUP;
←(MOVEI AC1 (%eR%f(%3 l %f)%3))
←(CALL 1 (E F))
.END
We have already mentioned some of the dangers in using special forms;
the fact that a compiler cannot do much with them either,
makes them even less attractive.
.CENT(Problems)
I. Extend the last %3compile%* function to handle macros.
II. Assume ⊗→%3and%*↔← allows an indefinite
number of arguments. Show how to modify %3compile%* to recognize %3and%*
and compile efficient code for its execution.
III. Define %3and%1 as a macro in terms of %3cond%1. Compare the code
produced in the two cases. How could you improve the compiler to make the two
sets of code more nearly alike?
.SS(Compilation and Variables,non-local variables,P5:)
.BEGIN TURN ON "#";
The models of compilation which we have sketched so far store
their λ-variables in the stack, %3P%*. References to those
variables in the body of the λ-expression are made to those
stack entries. This scheme suffices only for lambda or %3prog%*
variables which are used in a strictly local fashion.
We have said that λ-expressions may refer to global
or free variables. The lookup mechanism simply finds the latest
binding of that variable in the current symbol table; this is LISP's
dynamic binding strategy.
There are potential difficulties for compiled code
when dealing with dynamic binding.
The problem involves reference to variables which are currently λ-bound
but are non-local. Such variables
are called %2⊗→special variable↔←s%1.
Assume first, that we are attempting
a deep binding implementation. If
all we store on the stack is the value of a
variable, then another program which expects to use that value
will have no way of finding that stored value. One scheme
is to store pairs on the stack: name and value, then we
can search the stack for the latest binding.
This scheme is compatible with the stack implementation of deep binding given
in {yonss(P148)}.
The compiler can still `know' where all the
local variables are on the stack and can be a bit clever about
searching for the globals or special variables.
Shallow binding implementations offer an alternative. We can still store
variables on the stack⊗↓We assume throughout this discussion that we are
compiling code for the stack implementation of shallow binding
as given in {yonss(P225)}.← if we are
sure that the variable is used in a strictly local fashion.
If a variable is to be used as a special variable then
the compiled code should access the
value cell of that variable.
The compiler recognizes a variable as special by looking for the
property name %3SPECIAL%1 on the property list of the atom; if the property
exists and its
value is %et%1 then the variable is a special variable and
the compiler generates different code.
When a variable, say %3x%*, is
declared special the compiler will emit a reference to %3x%* as
%3(GETV#AC%4i%*#X)%1
or %3(PUTV#AC%4i%*#X)%1
rather than the corresponding
reference to a location on the stack.
%3GETV%* and %3PUTV%* are
instructions to access or modify the contents of the value#cell.
When the LISP assembler sees
one of these instructions, it will locate the value#cell
of atom and assemble a reference to that cell. Since the location
of the value#cell does %6not%* change, we can always find the current
binding. Any
interpreted function can also sample the value#cell so non-local values
can be passed between compiled and interpreted functions.
Non-local variables, therefore necessitate some changes to our compiler:
we have to be a bit careful in dealing with special variables.
Assume a function %3f%* calls a function %3g%*, and assume that
%3g%* uses some of %3f%*'s λ-variables.
The usual compilation for %3f%* would place the λ-variables in the stack
and they would not be accessible to %3g%*. Our compiler must therefore be
modified to generate
different prolog and epilog code for special variables.
The code must save the old contents of each special value#cell
on entry to the function, and the compiler must generate code to restore
those cells at function exit.
As we described above, any references in either %3f%1 or %3g%1
to those special variables
will involve %3GETV-PUTV%* rather than
references into the stack %3P%*. In this scheme, %3lookup[x;env]%1
is given by %3getv[x]%1.
.END
Non-local variables cause several problems in LISP. The
simple mechanism we used for referencing local variables is no longer
applicable. Other programming languages allow the use of non-local variables,
some, like APL (⊗↑[Ive#62]↑), only allow global variables; others,
like Algol60 and its
successors (⊗↑[Alg#63]↑ and ⊗↑[Alg#75]↑), allow free as well as global variables.
However, Algol compilers are much simpler to construct than LISP compilers,
and we should explore some of the reasons⊗↓For reasons other
than those we are addressing here, APL compilers are
difficult to construct.←.
One simplicity of Algol is its treatment of procedure
valued variables. Algol dialects typically restrict themselves to
what LISP calls functional
arguments. Algol
dialects do not allow arbitrary procedures to be returned as values.
Their restrictions
allow the run time environment to modelled in a stack, typically
as described in {yonss(P148)}.
The difference between LISP and Algol, which is more to the point here, is
their binding strategies#({yonss(P134)}). A typical LISP uses dynamic binding
whereas Algol uses static binding. The difference is that Algol translators
determine the bindings of variables at the time the definition is made
whereas LISP determines bindings at the time the function is applied.
That is, definitions in Algol always imply an application of %3function%1,
binding up all non-local variables.
The net effect is that Algol don't have free variables in the
sense that there is any choice of bindings; all choices
have been made when a procedure is declared. That
binding decision has dramatic
results when we come to implement language translators. As a result,
Algol can effectively compile all variable references
to be references into the run time stack, and need not retain the
name stack for variable look up at run time. It is not at all clear
yet which binding strategy will dominate. Counterexamples to exclusive
use of either strategy exist. Recent work (⊗↑[Ste#76b]↑, ⊗↑[Sus#75]↑, ⊗↑[Hew#76]↑)
points to the use of static binding in LISP-like languages.
A typical preserve of dynamic binding is that of interactive programming,
where programs are created as separate entities to be connected into
a cohesive whole at some later time. Frequently one wants the bindings
of the free varaibles to be postponed until such a grouping is made.
.SS(Interactive Programming,,P239:)
.BEGIN INDENT 20,20;
"... What is actually happening, I am afraid, is that we all tell each other
and ourselves that software engineering techniques should be improved
considerably, because there is a crisis. But there are a few boundary
conditions which apparently have to be satisfied. I will list them
for you:
%21.%1 We may not change our thinking habits.
%22.%1 We may not change our programming tools.
%23.%1 We may not change our hardware.
%24.%1 We may not change our tasks.
%25.%1 We may not change the organizational set-up in which the work
has to be done.
Now under these five immutable boundary conditions, we have to try to
improve matters. This is utterly ridiculous. Thank you. (%3Applause%1).
.END
.BEGIN TURN ON"→";
→E. Dijkstra, %3Conference of Software Engineering, 1968%1.
.END
We have talked about the constructs of LISP; we have talked about
interpreters and compilers for LISP; and we have talked a little about
input and output conventions for the language.
The combination of these properties leads us to a most interesting
practical aspect of LISP as a programing language: its use as an
interactive programming language. A programming language is a tool
for building programs. LISP's
representation of programs as data structures coupled with
the availablilty of display terminals offers the LISP programmer
unique opportunities for the interactive construction of programs.
Historically, machines have been oriented towards the rapid
execution of well-defined numerical algorithms. This perspective
over-looks two important points.
First, the actual process of discovering,
refining, and encoding the algorithm is a complex process.
In the early days of computation,
the programmer performed the analysis
on the algorithm on paper, transcribed the algorithm to a programming
language, encoded that program on coding sheets and keypunched a
card deck. That deck was supplied with data cards and presented to the machine.
If some
abnormal behavior was detected in your program, you were also
presented with an uninspiring octal dump of the contents of
memory. Memory dumps were an appalling debugging technique, even then.
Programming was frequently done in a higher level language so the contents
of most machine locations had little meaning to the programmer. Also the
state of the machine at the time the dump was taken frequently had only
a casual relationship with the actual bug.
The programmer had a slightly more appealing alternative
called %2tracing%1.
A program could be embellished with print statements whose purpose was to
determine access behavior by printing values of variables,
and to discover control behavior by printing messages at entry and exit from
procedures. This tracing technique was frequently available as an
operating system option. Then the programmer would supply control cards which
expressed what kind of output was desired. In either case,
unless this technique was used with resolute precision, the output
would either be voluminous or uninformative, or both.
When the cause of a bug was
discovered the offending cards were replaced and the
deck was resubmitted for execution.
This cycle was repeated until
an acceptable program was developed. This approach is still followed
by a majority of programmers. What is missed is that
much of the detail and tedium of these early phases of program development
can be aided by a well-constructed interactive programming system.
The second point which is overlooked is that a large class of interesting
problems are not included in the range of "well-defined numerical algorithms".
In fact most of the problems which are traditionally attacked by LISP
programs fall into this class: language design, theorem proving,
compiler writing, and of course, artificial intelligence.
In such "exploratory programming" it is often the case that no well-defined
algorithm is known, and it will be the final program which %6is%1 the
algorithm. Such exploratory programming requires that the programming language
be usable as a sophisticated "desk calculator". It requires experimentation with,
and execution of, partially specified programs; that is the ability to
develop and run pieces of programs; to build up a larger program from pieces;
to quickly modify either the program text or the computational results themselves,
before the programmers have lost their train of thought.
An important outgrowth of such exploratory programming is a LISP technique called
"throw-away implementation". In developing a large programming system
one begins with a few simple ideas and incrementally develops the larger system.
If some of the ideas are inadequate they are replaced. At some stage an adequate
model is developed; it may be lacking in some aspects, but it does model the
desired phenomenon. At that point, the programmer's understanding has
been sufficiently improved, that the implementation should be thrown away and a
new, more enlightened version, created.
Certainly this kind of programming can be accomplished with languages other
than LISP; the point is that an interactive LISP environment is
sufficiently responsive that an implementation cycle is quite short and
relatively painless. The effect is that the programmer does not invest
so much effort in an implementation that he feels a compulsion to
patch an inferior implementation, rather than start afresh.
The development of an interactive programming system has several important
implications. The usual partitioning of program preparation into editing,
running, and debugging is no longer adequate. The text editor and the debugger
are
integral parts of the system. A programming "environment" is established
in which all facets of programming are integrated. The tasks which are usually
performed by an operating system are subsumed by the programming language.
The idea of a separate file system becomes obsolete, and all programs and
data are accessible from within the "environment". This has an important
advantage in LISP-like representations of programs: the conversion from
internal representation to "text file" format is eliminated. The
technique puts added burden on naming facilities so that a programmer's
definitions are accessible, but are unambigiously addressible.
The effect is to structure the interactive environment as a very large data base
containing programs and data structures. The programmer has accessing procedures
to manipulate the elements in the base. All the items in the base
are accessible as data structures;
the editor can modify any objects in the base.
Some of the objects can be executed as procedures; the evaluator is
responsible for this. A procedure object can be further expanded into
machine instructions for faster execution; this may either be done by an
explicit call on a compiler, or better yet, be done invisibly
by a compiler/interpreter. If an evaluation is not performing
as expected, yet another data structure manipulating program is available.
The debugger is able to manipulate both the program structure and the run#time
data structures which the evaluator has created. Any of these data structure
manipulating programs is able to call any other program. This
view of program development is in direct conflict with the
traditional approach which grew from the card#deck philosophy.
Several current research projects are developing along these lines;
among them are ⊗↑[Hew#75]↑, ⊗↑[Int#75]↑, and ⊗↑[Win#75]↑. All of these
projects are based on LISP.
It is not accidental that LISP is the major language for these kinds of
programming tasks; it is the features of the language which make it
amenable to such programming tasks. In the next two sections
we will discuss two of the ingredients of interactive programming
systems; this will illuminate the features of LISP which make it
important for interactive programming.
First we will sketch some basic features of LISP text editors. This will
show some of the benefits of having the program structure available as
a data structure. The succeeding section will discuss a few features of a typical
LISP debugging system; this will further demonstrate the advantages of having
a natural program representation available at run time.
There is no standard LISP editor or debugger.
Therefore the next
sections will contain general information rather than
an abundance of concrete examples.
The design of such devices
is a subject of very personal preferences and prejudices. Some characteristics
are common and those we will stress. A related difficulty is that editing and
debugging devices are best done as interactive display programs, rather
than as key-punch or teletype programs⊗↓That is one of the author's
preferences and prejudices.←.
Interactive programming is a very visual and dynamic enterprise;
teletype-oriented interaction is not sufficient since it
results in a presentation more like a comic strip than a movie.
.SS(LISP Editors,,P247:)
A LISP editor is just another LISP program; it operates on
a data structure. In this case the
data structure represents a program. A simple editor could be constructed
along the lines of the %3subst%1 function:
.BEGIN SELECT 3; GROUP; TABIT2(17,25);
subst%9'%3 <= λ[[x;y;z]\[atom[z] → [equal[y;z] → x; %et%3 → z];
\ %et%3 → cons[\subst%9'%3[x;y;car[z]];
\\subst%9'%3[x;y;cdr[z]]]]]
.END
That is, we would let %3z%1 be the program; %3y%1, the piece of
text to be replaced; and %3x%1, the new text.
Such global editing %6is%1 useful, but control of text editing
is necessary.
A typical editor will take an expression as input and will
then enter a "listen-loop", waiting for commands from the user.
The input expression may either be a list representing a constant
data structure, or a list representing a (constant) function.
There are commands for the selection of a subexpression of the
input expression; and there are commands for the replacement
of expressions with other expressions.
The mechanics of presenting list structure to the user are interesting.
Since a list may be deeply nested, we need a convenient way of designating
subexpressions. On a display device we might display the selected expression
more brightly than the containing expression. That is useful, but
more structured representations of text are required.
Typically, a "pretty-printed"#({yonss(P226)}#or#{yon(P245)})
version of the text is presented.
If the text is too extensive to fit on the display face, then
abbreviational devices are available.
If the text is deeply nested it is often difficult to
perceive the top level structure even in pretty-printed form.
Consider the S-expr representation of the definition of %3member%1:
.BEGIN SELECT 3;TABIT1(10);group;
(MEMBER
(LAMBDA (X L)
(COND\((NULL L) NIL)
\((EQ X (FIRST L)) T)
\(T (MEMBER X (REST L))))))
.END
In this case the structure of the %3COND%1 is clear; but
it is clearer if we express it as:
.BEGIN SELECT 3;TABIT1(10);
\(LAMBDA (X L) (COND & & &))
.END
.BEGIN group;centerit;
Or given a linear list:←%3(%7a b c d e f g h i j k l%3)%1
.END
it may be more instructive to display it as:
.BEGIN group;centerit;
←%3(%7a b c d e f g h i %3... )%1 or,
←%3( ...%7d e f g h i %3... )%1.
.END
where the focus of attention is controlled by the user.
There should be commands to move selected subexpressions
to different locations within the same structure and move expressions
to other structures. Since a common text error in LISP is the
misplacing of parentheses, there are commands to move parentheses.
There are at least two reasons for text editors: programmers make errors,
and programs which are correct need to be modified to perform other
tasks. Particularly in exploratory programming, the "try-it-and-see" attitude
must be supported. Thus we demand a flexible editor which can
"undo" changes to functions or constant data structure.
LISP editors have the ability to save the current
edit structure such that an "undo" can restore to that state if
needed.
Regardless of the idiosyncrasies of a particular editor
the common feature is that LISP editors are structure-oriented
editors. They operate on well-formed S-expressions, not
text strings or card images.
A program is not a linear string of characters; it is a
structured entity, whose parts are distinguishable
as representations of instances of programming language constructs.
The editing process
should take cognizance of that structure⊗↓ A case can be made for
believing that the program %6construction%1 process should also
be driven by that structure ⊗↑[Han#71]↑.←.
.SS(Debugging in LISP,,P168:)
Few areas of the Computer Science field are as primitive
as that of debugging. Few areas of the field are as important. Getting a
correct program indeed is the whole point of our programming.
The power of our debugging techniques has been directly related to the
sophistication of the hardware/software interface which is available.
Not until the advent of sophisticated on-line systems has there really
been any hope for practical "correct-program" construction.
Several pieces of information are required to do interactive
debugging. We need an indication of the error condition; we need the
current state of the computation; we need to have some indication of
how the computation arrived at the error condition; and, if
interactive debugging is to be meaningful, we need the ability
to modify the computation and resume execution in that modified environment.
This last point is quite important; it has implications for programming
style. First, we %6should%1 hope to modify an errant calculation
rather than restart the entire computation. To start over is like
repunching a whole card deck because one card was wrong. We repunch
the offending cards and retain the rest. Similarly we should explore the
possibilities of throwing away offending computations and retaining the
rest. Typically, computation is not as local and exciseable as
removing a single card; a primary purpose of most computation is to
pass a result to some other procedure. However, if we try to localize the
effects of each procedure to simple parameter passing and value returning
then we have a better chance of discovering a point in the computation history
which is prior to the error; return the control stack to that point;
modify the erring procedure and restart the computation. This
implies that procedures should minimize their use of side-effects;
for it is side-effects which spoil the nice applicative behavior
and will require the programmer to make explicit modifications in the
computational environment before a computation can be restarted.
Such a programming style is called %2⊗→modular programming↔←%1, meaning
that each procedure is a module, or black box, dependent on other
procedures only by way of well-defined input and output considerations.
It is this style of modular programming which will enhance the use
of interactive debugging tools.
This section will deal only with the primitive mechanisms which underlie
LISP debugging techniques. The discussions of more complex tools
which are available or are comtemplated
are well-documented in other sources; ⊗↑[Int#75]↑, ⊗↑[Moo#74]↑.
Crucial pieces of information about the
behavior of the program are: which functions are being entered, what
are the actual parameters, and what are the values being returned.
Assume that we wish to monitor the behavior of the function, %3foo%*. We
will place the real definition of %3foo%* on another symbol table entry
(using %3gensym[]%*) and redefine %3foo%* such that, when it is called, it
will:
.BEGIN narrow 10;;
%21.%* print the values of the current actual parameters.
%22.%* use %3apply%* to call the real defintion of %3foo%* with the current parameters.
%23.%* print the value of the call on %3foo%*.
%24.%* return control and the value to the calling program.
.END
This technique is called tracing, and the description we have given is one
suitable to a teletype-like device. Given an interactive display and a well-defined
"LISP machine" description like that in %3peval%1#({yonss(P211)}),
a much more satisfactory trace can be given.
Since %3foo%* may be recursive we should also give some
indication of the depth of recursion being executed. Now every call
on %3foo%* will give us the pertinent statistics. Interpreted calls on
%3foo%* will go through %3eval%*, and if %3(CALL ... FOO)%* is being used in the
compiled code the submonitor can pass control to the tracing
mechanism; but if the call is a %3PUSHJ%*, the control passes
directly to the machine language code and we will not intercept
the call.
On most implementations of LISP the %3CALL%* may be replaced
by a %3PUSHJ%1. This is done to speed execution, since a %3PUSHJ%1
executed at machine speed, transfering to known location; whereas
the %3CALL%1 is passed to %3decode%1#({yon(P243)}) and
the function definition is looked up. Typically there is a
programmer-controlled mechanism for replacing %3CALL%1s with %3PUSHJ%1s⊗↓As
we have seen, %3CALL%1s to functional variables won't be replaced.←.
After a program is
debugged, the programmer may replace the %3CALL%* with the
%3PUSHJ%* and the
programs will execute faster. On some implementations this action is
even reversible#(⊗↑[Ste#pc]↑); a table is kept, relating the %3CALL%1s
to the %3PUSHJ%1s; when tracing is desired, the %3CALL%1 version is made
available⊗↓Actually, the scheme is as follows: instead of
assembling the %3CALL%1 into a memory location, an XCT is assembled;
the XCT references a copy of a table which contains the actual %3CALL%1.
The user may replace the %3CALL%1s by %3PUSHJ%1, but also
has the original table available to replace the modified version
when tracing is desired. This XCT trick has the additional benefit
of allowing several users to share a "pure" piece of program, while
some people are tracing and some people are not.←.
A variant of this tracing scheme can be used to monitor %3SET%*s
and %3SETQ%*s. We can modify their definitions to
print the name of the variable and the new value, perform
the assignment, and
return. This technique can be lost in some compiled code. If we
compile local variables as references into the value stack, we have lost both
the names and the ability to trace their behavior.
Variable references which use %3PUTV%1 and %3GETV%1 can be
traced like %3CALL%1. In fact, on %2SM%1, we have an operation
analogous to %3PUSHJ%1, so the %3CALL-PUSHJ%1 technique is open to us.
%3PUTV%1 and %3GETV%1 can be implemented as hardware %3MOVEM%1 and %3MOVE%1
instructions.
The trace facility is a debugging feature which has been
adapted from the batch-processsing versions of LISP. There is a related,
but more interactive, version of this technique called the %2break%1 package.
In this mode of tracing, the user can specify that the program should halt
on recognition of certain conditions. If that halt occurs, the break package
is entered and the user may then type commands which survey the state of the
computation. Expressions may be evaluated, which may themselves enter the break
package recursively. If desired, the LISP editor may be called
either to edit function definitions or to edit an expression on the
actual control stack of the current computation.
Since it is difficult to predetermine when a computation may
require debugging, several systems supply an interrupt system
analogous to that found in hardware machines. Striking certain keys
may then cause interrupts to the break package, just as if a break condition
were pre-programmed. Such a feature is useful in conjunction with the
trace package. If a trace indicates to the user that the computation
is not performing according to expectation then an interrupt key can be
struck and the computation will be suspended.
User-definable interrupt systems apply to other areas of computation
than that of debugging. The most well-developed system is that
of MacLISP.
The ability to selectively trace the execution, coupled with the ability to
interrupt a computation, allows the user to
examine computations which are suspected of divergence.